Releases: ekscrypto/SwiftEmailValidator
1.7.0 — SwiftEmailValidatorIDNA + UTS #46 §4 V1-V7
New opt-in companion target layering UTS #46 IDNA Compatibility Processing on the host portion of the address. Mirrors the SwiftEmailValidatorUTS39 architecture: imported separately (import SwiftEmailValidatorIDNA) so the ~385 KB UCD-derived data does not bundle into callers that don't need it.
Highlights
- Full UTS #46 §4 pipeline — Map / NFC / Break into labels / Validate / ToASCII.
- All V1-V7 validity criteria enforced by default: V1 NFC, V2 hyphen rules (with
xn--carve-out), V3 leading combining mark rejection, V4/V5 per-scalar status + STD3 LDH gate, V6 CheckBidi, V7 CheckJoiners. - RFC 3492 Punycode encoder/decoder with overflow guards on every multiply/add.
- RFC 5893 §2 Bidi rule (UTS #46 V6). Domain-wide trigger per §1.4 — pure-LTR siblings of RTL labels are still gated.
- RFC 5892 §A.1 / §A.2 CONTEXTJ rules (UTS #46 V7). ZWJ/ZWNJ allowed only in legitimate joining contexts; misuse — a known homograph attack vector — is rejected.
- RFC 5892 §A.3-§A.9 CONTEXTO layered on top of UTS #46 as a default-on security extension. Catches Catalan middle dot, Greek keraia, Hebrew geresh/gershayim, Katakana middle dot, and mixed Arabic-Indic / Extended Arabic-Indic digit homographs. Opt out with
IDNA.Options.checkContextO: falsefor strict UTS #46-only conformance. - VerifyDnsLength + UseSTD3ASCIIRules enforced at the validator layer. STD3 specifically required because the modern preprocessed mapping table marks non-LDH ASCII as
valid(withNV8).
Conformance
IdnaTestV2DriverTests runs the official Unicode IdnaTestV2.txt (v17.0.0) end-to-end across toUnicode, toAsciiN (Nontransitional), and toAsciiT (Transitional). All status code families (Pn, Vn, An, Bn, Cn, Xn, U1) are in scope: any row carrying any of them must be rejected. 0 failures across >1000 vectors × 3 operations.
API surface
import SwiftEmailValidatorIDNA
// Direct UTS #46 use:
IDNA.toAscii("münchen.de") // "xn--mnchen-3ya.de"
IDNA.toUnicode("xn--mnchen-3ya.de") // "münchen.de"
// Plug into EmailSyntaxValidator:
EmailSyntaxValidator.correctlyFormatted(
"user@münchen.de",
idna: IDNA.Options()) // defaults: nontransitional, all checks onCompatibility
No breaking changes. 1.6.x remains source-compatible — the new IDNA target is opt-in via a separate import.
Full changelog
See CHANGELOG.md for the complete entry, including the tooling and benchmark scope-grading changes.
1.6.1 — Default_Ignorable hardening + RFC compliance fixes
Security/correctness release. All findings reduce permissiveness; no API surface changed. Users on 1.6.0 should upgrade.
Security
Default_Ignorable spoofing closure (RFC 5892 §2.6)
CharacterSet.letters on Darwin admits a number of Default_Ignorable scalars that produce no glyph and are DISALLOWED in IDNA2008. Several slipped through both the local-part and domain-label gates in 1.6.0:
- Domain label path now rejects
U+3164HANGUL FILLER,U+FE0FVS-16,U+E0100VS-17 (SSP),U+180B–U+180FMONGOLIAN FREE VARIATION SELECTORS,U+115F/U+1160HANGUL CHOSEONG/JUNGSEONG FILLERS,U+17B4–U+17B5KHMER VOWEL INHERENT, andU+FFA0HALFWIDTH HANGUL FILLER. PVALID combining marks (e.g.U+05B0HEBREW POINT SHEVA) remain accepted; a canary test pins this. - Local-part path now rejects
U+034FCOMBINING GRAPHEME JOINER, the SMPU+1BCA0–U+1BCA3SHORTHAND FORMAT controls, theU+1D173–U+1D17AMUSICAL SYMBOL formatting controls, and the reservedU+FFF0–U+FFF8block. - Leading combining marks rejected. A label starting with an Mn/Mc/Me scalar (e.g. lone
U+0301) is now rejected per-label. Mid-label combining marks remain accepted so legitimate diacritics still validate.
Validator behaviour fixes
- Empty quoted local part rejected for parity with the dot-atom path (RFC 5321 §3.3).
.arparejected byTLDDomainValidatorper RFC 3172 (DNS infrastructure only).TLDDomainValidator.isPubliclyDeliverable(_:)two-layer split. The public form now trims surrounding whitespace and foldsU+3002/U+FF0E/U+FF61to ASCII.before dispatching to a raw_isPubliclyDeliverable(_:)worker. Hardens the API in isolation; end-to-end pipeline throughEmailSyntaxValidatorwas already protected.- IPv6 regex case + leading-zero gaps.
_matchIPv6accepted embedded-IPv4 octets with leading zeros (e.g.192.168.001.001) and rejected[IPv6:::FFFF:1.2.3.4]because the IPv4-mapped prefix was hardcoded lowercase. Both fixed (RFC 4291 §2.2 / RFC 3986 §3.2.2).
RFC 2047 encoder/decoder hardening
- 75-octet cap enforced on encoder output. The decoder rejected over-length encoded-words but the encoder did not, so long Unicode inputs auto-encoded then re-decoded silently failed to round-trip.
encode()now returnsnilif the assembled=?utf-8?b?<base64>?=exceeds 75 chars. - Base64 residue==1 rejected explicitly instead of relying on Foundation to reject the malformed
===padding. - Encoded-text grammar tightened to
1*<text>per RFC 2047 §2 — rejects empty payloads (=?utf-8?b??=) and literal?in encoded-text.
UTS #39 hardening
- §5.2 Moderately Restrictive: second-script pool restricted to UAX #31 Recommended scripts. Closes a lenient-mode bypass where Latin + Phoenician/Limbu/etc. passed when
rejectRestrictedIdentifiers=false. - §5.1 Augmented_Script_Set applied in Single Script analysis. Pure-Japanese (Han+Hira+Kana, no Latin), pure-Korean (Han+Hang), pure-Chinese-with-Bopomofo (Han+Bopo) and Hira+Kana strings were misclassified as multi-script at
.singleScript.
Documentation
.asciiWithUnicodeExtensiondocumented as a project convention (whole-address RFC 2047 wrap), not standards-conformant SMTPUTF8.domainLabelCharacterSetdocumented as a coarse Letter+digit gate, not RFC 5891 §4.2.3.2 PVALID.- UTS #39 docstrings corrected (
RestrictionLevel.highlyRestrictivecombos now match §5.2.2 Table 1; non-existent §5.2.1/§5.2.2/§5.2.3 anchors switched to §5.2's named-bullet form). - UTS #39 out-of-scope sections documented (§5.6.1 Whole-Script Confusables, §5.7.1 Mixed-Numbers,
Identifier_Type=Not_NFKC). - Domain-length comment now cites RFC 5321 §4.5.3.1.2 as the headline (255-octet wire cap) and explains how 253 is the derived presentation-form ceiling.
Tooling
Tools/generate_tlds.pyswitched to PyPIidna(IDNA2008 + RFC 3492 Punycode) from stdlibencodings.idna(deprecation-flagged). Generated output is byte-identical today; no consumer impact.
Tests
- Test count grew from 272 to 299 (all passing).
- Multiple weak assertions tightened across the suite.
- DemoApp test corpus extended with 34 Default_Ignorable spoofing cases.
See CHANGELOG.md for the full entry.
1.6.0 — drop SwiftPublicSuffixList for bundled IANA TLD validator
Removed
SwiftPublicSuffixListdependency. The package no longer pulls any third-party Swift dependency. The Public Suffix List was the wrong primitive for email validation: it was designed for cookie scoping and its multi-level / PRIVATE-section entries are policy artifacts of specific registries, with weekly churn driven by non-email concerns.
Added
-
TLDDomainValidator(new public type). Default domain validator used byEmailSyntaxValidator. Confirms the rightmost DNS label is a currently-delegated IANA TLD (ACExn--…and Unicode U-label forms both accepted) and rejects names reserved by the IETF Special-Use Domain Names registry:.test(RFC 6761 §6.2).example,example.com,example.net,example.org(RFC 6761 §6.5).invalid(RFC 6761 §6.4).localhost(RFC 6761 §6.3).local(RFC 6762 — mDNS).onion(RFC 7686 — Tor).alt(RFC 9476)home.arpa(RFC 8375)
Subdomains under any of these are also rejected.
-
Sources/SwiftEmailValidator/Generated/IANATLDs.swift— bundled IANA TLD set (~1,400 ACE + ~150 U-label entries). Auto-generated; do not edit by hand. -
Tools/generate_tlds.py— generator that fetcheshttps://data.iana.org/TLD/tlds-alpha-by-domain.txt, expands ACE TLDs to U-labels, and writes the Swift source. Records source URL, fetch timestamp, and SHA-256. -
.github/workflows/update-tlds.yml— nightly workflow that refreshes the bundled TLD list and opens a PR if it changed. -
TLDDomainValidatorTests— new test class covering real TLDs, fake TLDs, special-use rejection, IDN handling, case insensitivity, trailing root dot, and wiring as the validator default.
Changed
- Default
domainValidatorclosure onEmailSyntaxValidator.correctlyFormattedandmailbox(from:)switched from{ PublicSuffixList.isUnrestricted(PublicSuffixList.ace($0)) }to{ TLDDomainValidator.isPubliclyDeliverable($0) }. UTS39.domainValidator(_:base:)default base closure likewise switched from PSL toTLDDomainValidator.EmailSyntaxValidator.correctlyFormatted(_:uts39:)andmailbox(from:uts39:)convenience overloads likewise switched.- README & benchmark output rewritten to describe the new default and the rationale for moving off the PSL.
Migration notes
- Drop the dependency: remove
SwiftPublicSuffixListfrom yourPackage.swift. SwiftEmailValidator no longer requires it. @example.com/@example.net/@example.orgnow fail the default validator (RFC 6761 §6.5). If your tests or sample addresses used these, switch to a real public domain (@iana.orgis stable) or pass a permissivedomainValidator: { _ in true }.@localhost,@host.local, intranet domains also fail the default. Pass a customdomainValidatorclosure if your application accepts these — see "Domain validation" in the README.- PSL-based custom rules: if you were calling
PublicSuffixList.isUnrestricted($0, rules: customRules), replace with your own closure (the test suite has examples of a simple TLD-allowlist closure inLocalPartValidatorHookTests). - Newly-delegated TLDs: the bundled list ships frozen at the release SHA. The nightly GitHub workflow keeps the canonical copy current; downstream consumers waiting for a tagged release can override
domainValidatorwith their own check or runpython3 Tools/generate_tlds.pyand ship the regenerated file.
See CHANGELOG.md for the full entry.
1.5.0 — UTS #39 Unicode Security Mechanisms
SwiftEmailValidatorUTS39 — opt-in companion target
New second library product layering UTS #39 on top of the core validator. Import only what you need:
import SwiftEmailValidator // unchanged, zero new cost
import SwiftEmailValidatorUTS39 // adds ~280 KB of UCD data + checksWhat the addon enforces
- Identifier_Status filter — rejects Restricted scripts (Linear B, Runic, Deseret, etc.)
- Mixed-script detection — Single Script / Highly Restrictive / Moderately Restrictive per §5.2
- §4 confusable skeletons — skeleton-equality against caller-supplied protected forms (opt-in per call)
Usage
Convenience API — one call, both sides of the address checked:
let ok = EmailSyntaxValidator.correctlyFormatted(
"alice@example.com",
uts39: UTS39.Policy() // default: Highly Restrictive
)Lower-level — compose closures yourself:
EmailSyntaxValidator.correctlyFormatted(
candidate,
domainValidator: UTS39.domainValidator(policy),
localPartValidator: UTS39.localPartValidator(policy)
)Main-library change (non-breaking)
One new closure parameter on correctlyFormatted and mailbox(from:):
localPartValidator: (String) -> Bool = { _ in true }It receives the semantic local-part string (dot-atom as-is, quoted-string unescaped). Default preserves existing behavior.
Implementation notes
- All Unicode tables generated from UCD 17.0.0 via
Sources/SwiftEmailValidatorUTS39/Tools/generate.py - Skeleton algorithm iterates map+NFD to a fixed point (13 of ~6500 confusables.txt entries require up to 3 iterations for idempotence)
- Multi-scalar NFD sources (48 entries, e.g. U+01A1 → [U+006F U+031B]) handled via a longest-match prefix table
- Restriction Level classification uses per-scalar
Script_Extensions ∩ target ≠ ∅per §5.1 (not union-based)
Test count
242 total (was 164). Bulk regression test walks every entry in the generated confusables table; restriction-level edge cases cover UTS #39 §5.2 examples and ICU itspoof.cpp patterns including Arabic-Indic digits, combining marks, and the Japanese/Korean/Chinese whitelist combos.
1.4.1 — accept RFC 4291 §2.2 format-2 IPv6 literals
Small patch release. Closes the single syntax-level gap surfaced by the
reverse-check added in 1.4.0's Benchmarks harness.
Fixed
- IPv6 literal regex now accepts RFC 4291 §2.2 format 2. Six
uncompressed hex groups followed by a trailing IPv4-in-dotted-decimal
(e.g.aaaa:aaaa:aaaa:aaaa:aaaa:aaaa:127.0.0.1) are now recognised as
valid. The upstream regex this validator was derived from only
recognised the compressed / IPv4-mapped forms (::ffff:x.x.x.x,
1::5:x.x.x.x). Email addresses such as
valid.ipv6v4.addr@[IPv6:aaaa:aaaa:aaaa:aaaa:aaaa:aaaa:127.0.0.1]
now validate as expected.
Maximum IPv6 literal length remains 45 octets, which is exactly the
existing IPAddressSyntaxValidator public-API length cap — no guard
changes needed.
Tests
Added testIPv6Format2UncompressedWithEmbeddedIPv4 (5 positive cases
including the length boundary) and testIPv6Format2RejectsWrongGroupCount
(5 negative cases for wrong hex-group count, wrong IPv4 octet count, and
out-of-range octets). Test count is 164.
Reverse benchmark
After this fix the --reverse benchmark mode reports 140/144 agreement
with competitor test assertions. The remaining four disagreements are all
syntax-vs-policy differences (127.0.0.1, 127.0.0.256, 127.0.0.1.26,
mailserver as unbracketed hostnames) — our syntax layer accepts them
because they are legal RFC 1035 labels, and the shipped default
domainValidator rejects them as domain policy.
Full changes: 1.4.0...1.4.1
1.4.0 — IP validator DoS hardening + benchmark harness
Additive minor release. No behavior change for users of EmailSyntaxValidator.
Added
-
IPAddressSyntaxValidatorpublic length-capped wrappers.match(_:),
matchIPv4(_:), andmatchIPv6(_:)now apply autf8.countguard
(15 octets for IPv4, 45 for IPv6) before dispatching to the regex engine.
Prior to 1.4.0 a caller invokingIPAddressSyntaxValidatordirectly with
a multi-megabyte string would spend O(n) insideNSRegularExpression
before the$anchor failed — a denial-of-service vector. Internal raw
matchers_match(_:)/_matchIPv4(_:)/_matchIPv6(_:)retain the
pre-1.4.0 behavior forEmailSyntaxValidator's hot path, which is
already bounded by the upstream 254 UTF-8 octet address cap. -
Benchmarks/SPM package. A standalone harness runs the 195-case
DemoApp corpus through every SPM-consumable Swift email validator we
could locate (evanrobertson, MimeEmailParser, bdolewski's regex,
jwelton-equivalent viaNSDataDetector) and emits a Markdown accuracy
table. Published results are in the new "Comparison with other Swift
email validators" section of the README. Kept out of the main
Package.swiftso consumers don't transitively pull competitor deps.
Security
The length-capped public wrappers close the only input-length DoS vector
found in a manual audit of the library's public API surface.
EmailSyntaxValidator users were never exposed. No crashes introduced;
correctlyFormatted(_:) behavior is unchanged.
Compatibility
- Purely additive: the new public methods wrap previously-internal
behavior. No existing API signature or return value changed. - Test suite: 162 passing (was 157).
- Minimum Swift / platforms unchanged (Swift 5.5, macOS 10.12+, iOS 11+, tvOS 11+).
Full changes: 1.3.1...1.4.0
1.3.1 — SwiftPublicSuffixList 3.1.0 compatibility
Changed
- SwiftPublicSuffixList dependency bumped to 3.1.0. v3.0 tightened
PublicSuffixList.isUnrestricted(_:)/match(_:)to reject non-ASCII hostnames — IDN labels must be in ACE (Punycode) form. The defaultdomainValidatorclosure now callsPublicSuffixList.ace(_:)on the domain before dispatching toisUnrestricted(_:), so Unicode IDN domains continue to validate exactly as they did on 1.3.0 with PSL 2.x. Mailbox.Host.domain(...)still carries the original user-facing string; only the validator dispatch uses the ACE form.
Migration
Callers who pass a custom domainValidator closure to correctlyFormatted(_:) / mailbox(from:) and rely on the PSL default behavior via PublicSuffixList.isUnrestricted(_:) should wrap their call site with PublicSuffixList.ace(_:) if the closure receives Unicode IDN domains — e.g. { PublicSuffixList.isUnrestricted(PublicSuffixList.ace($0), rules: myRules) }.
1.3.0 — EmailNormalizer.nfc + RFC 6532 audit
Highlights
New API: EmailNormalizer.nfc(_:)
RFC 6532 §3.1-compliant, name-preserving NFC normalization. Use this when you need a spec-compliant comparison form, or when you are normalizing an address you intend to preserve for display, forwarding, or reply-to. NFC collapses canonically-equivalent scalar sequences (e.g. decomposed e + U+0301 → precomposed é) but does not fold compatibility variants (fullwidth, ligatures, superscripts).
EmailNormalizer.nfkc(_:) remains available for anti-spoofing / account de-duplication use cases (Gmail-style), as a documented deliberate deviation from RFC 6532 §3.1's "SHOULD NOT".
RFC compatibility audit
Conducted an in-depth audit of the normalizer against RFC 6532 §3.1, RFC 5198 §3, RFC 5891 §4.1 (IDNA2008), RFC 5321 §2.4, RFC 5321 §4.1.2 (quoted strings), and UAX #15. No spec violations found. nfc(_:) is fully compliant; nfkc(_:) deviations are documented in-source and pinned by tests.
Docstring improvements
nfkc(_:)IDNA2008 section now names UTS#46 transitional mode and RFC 3492 Punycode explicitly- Dot-folding example list extended (U+FF0E, U+3002, U+FF61)
- New section enumerating compatibility folds that introduce ASCII SPACE (U+00A0, U+2003, U+FDFA)
New tests
testNfcIsIdempotent/testNfkcIsIdempotent— UAX #15 D8/D9 stabilitytestNfkcOutputIsAlsoInNfc— NFKC ⊇ NFC property
Test count: 157 (was 154 in 1.2.2), 0 failures.
No breaking changes
Pure additive release. Existing nfkc(_:) callers are unaffected.
1.2.2 — internal simplification
What changed
Follow-up to 1.2.1 — post-release adversarial re-review empirically verified that the 17-case inline BMP guard in extractQuotedString was fully redundant with the per-scalar check at the end of the loop. Collapsed. Also tightened the RFC 2047 candidate gate to reject supplementary-plane Tag/PUA/noncharacter scalars upfront (previously caught after encode + re-parse).
No behavior change for valid input. All 129 tests pass.
Details
- extractQuotedString: removed the 17-case inline BMP
containsblock (U+00AD, U+00A0, U+1680, U+2000–U+200A, U+200B–U+200D, U+202F, U+205F, U+2060–U+2065, U+3000, U+FEFF, U+FE00–U+FE0F, U+2028–U+2029, U+FDD0–U+FDEF, U+FFFE, U+FFFF). All 26 scalars confirmed rejected byqtextUnicodeSMTPCharacterSet's per-scalarallSatisfy. The supplementary-plane guard (isRejectedSupplementaryScalar) stays in the loop — SSP scalars like U+E0100 can attach as grapheme extenders and must be rejected per-scalar, since they are unioned into the set and cannot be subtracted (Foundation.subtracting()bug on supplementary planes). - candidateForRfc2047: first gate now also rejects
isRejectedSupplementaryScalar, bringing it in line withextractDotAtom/extractQuotedString. - Replaced a misleading comment that claimed
allSatisfy"only examines the first scalar of each Character" (it iterates all scalars).
Upgrade
Drop-in patch. No API or behavior change for valid input.
Internal refactor + invariant tests
A patch release with no behavior change. Adversarial re-review of the post-1.2.0 design observations confirmed two items and retracted three; this release lands the confirmed work and adds clarifying comments around the retracted items so future reviewers reach the same conclusion faster.
Confirmed (implemented)
- Extracted
isRejectedSupplementaryScalar(_:)— the 6 supplementary-plane conditions genuinely shared betweenextractDotAtomandextractQuotedStringare now in one place. The BMP block inextractQuotedStringstays put — it is path-specific (per-grapheme-cluster iteration), not a duplicate of dot-atom's per-scalar check. - Added
InvariantTests.swiftwith 4 deterministic sweeps:correctlyFormatted ⇔ mailbox != nilagreement across historically buggy scalar ranges- dot-atom and quoted-string parity for supplementary-plane scalars
- ASCII acceptance implies Unicode acceptance (subset invariant)
- No crashes on short byte strings (8000 probes)
- Runs in ~90ms with no random seed (reproducible CI)
Retracted (with anti-confusion comments)
Optionsenum: doc comment now explains the[Options]array shape is intentional forward-compatibility.- CharacterSet block:
MARKsection header documents that co-location with parsing is deliberate (the subtract-before-union ordering invariant is load-bearing and easy to violate if split). extractQuotedStringBMP guard: comment now states it is not a duplicate ofextractDotAtom— different iteration model, different requirements.
Tests
- 125 → 129 (all passing).