Skip to content

Releases: ekscrypto/SwiftEmailValidator

1.7.0 — SwiftEmailValidatorIDNA + UTS #46 §4 V1-V7

26 Apr 14:52

Choose a tag to compare

New opt-in companion target layering UTS #46 IDNA Compatibility Processing on the host portion of the address. Mirrors the SwiftEmailValidatorUTS39 architecture: imported separately (import SwiftEmailValidatorIDNA) so the ~385 KB UCD-derived data does not bundle into callers that don't need it.

Highlights

  • Full UTS #46 §4 pipeline — Map / NFC / Break into labels / Validate / ToASCII.
  • All V1-V7 validity criteria enforced by default: V1 NFC, V2 hyphen rules (with xn-- carve-out), V3 leading combining mark rejection, V4/V5 per-scalar status + STD3 LDH gate, V6 CheckBidi, V7 CheckJoiners.
  • RFC 3492 Punycode encoder/decoder with overflow guards on every multiply/add.
  • RFC 5893 §2 Bidi rule (UTS #46 V6). Domain-wide trigger per §1.4 — pure-LTR siblings of RTL labels are still gated.
  • RFC 5892 §A.1 / §A.2 CONTEXTJ rules (UTS #46 V7). ZWJ/ZWNJ allowed only in legitimate joining contexts; misuse — a known homograph attack vector — is rejected.
  • RFC 5892 §A.3-§A.9 CONTEXTO layered on top of UTS #46 as a default-on security extension. Catches Catalan middle dot, Greek keraia, Hebrew geresh/gershayim, Katakana middle dot, and mixed Arabic-Indic / Extended Arabic-Indic digit homographs. Opt out with IDNA.Options.checkContextO: false for strict UTS #46-only conformance.
  • VerifyDnsLength + UseSTD3ASCIIRules enforced at the validator layer. STD3 specifically required because the modern preprocessed mapping table marks non-LDH ASCII as valid (with NV8).

Conformance

IdnaTestV2DriverTests runs the official Unicode IdnaTestV2.txt (v17.0.0) end-to-end across toUnicode, toAsciiN (Nontransitional), and toAsciiT (Transitional). All status code families (Pn, Vn, An, Bn, Cn, Xn, U1) are in scope: any row carrying any of them must be rejected. 0 failures across >1000 vectors × 3 operations.

API surface

import SwiftEmailValidatorIDNA

// Direct UTS #46 use:
IDNA.toAscii("münchen.de")     // "xn--mnchen-3ya.de"
IDNA.toUnicode("xn--mnchen-3ya.de")  // "münchen.de"

// Plug into EmailSyntaxValidator:
EmailSyntaxValidator.correctlyFormatted(
    "user@münchen.de",
    idna: IDNA.Options())  // defaults: nontransitional, all checks on

Compatibility

No breaking changes. 1.6.x remains source-compatible — the new IDNA target is opt-in via a separate import.

Full changelog

See CHANGELOG.md for the complete entry, including the tooling and benchmark scope-grading changes.

1.6.1 — Default_Ignorable hardening + RFC compliance fixes

26 Apr 05:22

Choose a tag to compare

Security/correctness release. All findings reduce permissiveness; no API surface changed. Users on 1.6.0 should upgrade.

Security

Default_Ignorable spoofing closure (RFC 5892 §2.6)

CharacterSet.letters on Darwin admits a number of Default_Ignorable scalars that produce no glyph and are DISALLOWED in IDNA2008. Several slipped through both the local-part and domain-label gates in 1.6.0:

  • Domain label path now rejects U+3164 HANGUL FILLER, U+FE0F VS-16, U+E0100 VS-17 (SSP), U+180BU+180F MONGOLIAN FREE VARIATION SELECTORS, U+115F/U+1160 HANGUL CHOSEONG/JUNGSEONG FILLERS, U+17B4U+17B5 KHMER VOWEL INHERENT, and U+FFA0 HALFWIDTH HANGUL FILLER. PVALID combining marks (e.g. U+05B0 HEBREW POINT SHEVA) remain accepted; a canary test pins this.
  • Local-part path now rejects U+034F COMBINING GRAPHEME JOINER, the SMP U+1BCA0U+1BCA3 SHORTHAND FORMAT controls, the U+1D173U+1D17A MUSICAL SYMBOL formatting controls, and the reserved U+FFF0U+FFF8 block.
  • Leading combining marks rejected. A label starting with an Mn/Mc/Me scalar (e.g. lone U+0301) is now rejected per-label. Mid-label combining marks remain accepted so legitimate diacritics still validate.

Validator behaviour fixes

  • Empty quoted local part rejected for parity with the dot-atom path (RFC 5321 §3.3).
  • .arpa rejected by TLDDomainValidator per RFC 3172 (DNS infrastructure only).
  • TLDDomainValidator.isPubliclyDeliverable(_:) two-layer split. The public form now trims surrounding whitespace and folds U+3002/U+FF0E/U+FF61 to ASCII . before dispatching to a raw _isPubliclyDeliverable(_:) worker. Hardens the API in isolation; end-to-end pipeline through EmailSyntaxValidator was already protected.
  • IPv6 regex case + leading-zero gaps. _matchIPv6 accepted embedded-IPv4 octets with leading zeros (e.g. 192.168.001.001) and rejected [IPv6:::FFFF:1.2.3.4] because the IPv4-mapped prefix was hardcoded lowercase. Both fixed (RFC 4291 §2.2 / RFC 3986 §3.2.2).

RFC 2047 encoder/decoder hardening

  • 75-octet cap enforced on encoder output. The decoder rejected over-length encoded-words but the encoder did not, so long Unicode inputs auto-encoded then re-decoded silently failed to round-trip. encode() now returns nil if the assembled =?utf-8?b?<base64>?= exceeds 75 chars.
  • Base64 residue==1 rejected explicitly instead of relying on Foundation to reject the malformed === padding.
  • Encoded-text grammar tightened to 1*<text> per RFC 2047 §2 — rejects empty payloads (=?utf-8?b??=) and literal ? in encoded-text.

UTS #39 hardening

  • §5.2 Moderately Restrictive: second-script pool restricted to UAX #31 Recommended scripts. Closes a lenient-mode bypass where Latin + Phoenician/Limbu/etc. passed when rejectRestrictedIdentifiers=false.
  • §5.1 Augmented_Script_Set applied in Single Script analysis. Pure-Japanese (Han+Hira+Kana, no Latin), pure-Korean (Han+Hang), pure-Chinese-with-Bopomofo (Han+Bopo) and Hira+Kana strings were misclassified as multi-script at .singleScript.

Documentation

  • .asciiWithUnicodeExtension documented as a project convention (whole-address RFC 2047 wrap), not standards-conformant SMTPUTF8.
  • domainLabelCharacterSet documented as a coarse Letter+digit gate, not RFC 5891 §4.2.3.2 PVALID.
  • UTS #39 docstrings corrected (RestrictionLevel.highlyRestrictive combos now match §5.2.2 Table 1; non-existent §5.2.1/§5.2.2/§5.2.3 anchors switched to §5.2's named-bullet form).
  • UTS #39 out-of-scope sections documented (§5.6.1 Whole-Script Confusables, §5.7.1 Mixed-Numbers, Identifier_Type=Not_NFKC).
  • Domain-length comment now cites RFC 5321 §4.5.3.1.2 as the headline (255-octet wire cap) and explains how 253 is the derived presentation-form ceiling.

Tooling

  • Tools/generate_tlds.py switched to PyPI idna (IDNA2008 + RFC 3492 Punycode) from stdlib encodings.idna (deprecation-flagged). Generated output is byte-identical today; no consumer impact.

Tests

  • Test count grew from 272 to 299 (all passing).
  • Multiple weak assertions tightened across the suite.
  • DemoApp test corpus extended with 34 Default_Ignorable spoofing cases.

See CHANGELOG.md for the full entry.

1.6.0 — drop SwiftPublicSuffixList for bundled IANA TLD validator

26 Apr 05:22

Choose a tag to compare

Removed

  • SwiftPublicSuffixList dependency. The package no longer pulls any third-party Swift dependency. The Public Suffix List was the wrong primitive for email validation: it was designed for cookie scoping and its multi-level / PRIVATE-section entries are policy artifacts of specific registries, with weekly churn driven by non-email concerns.

Added

  • TLDDomainValidator (new public type). Default domain validator used by EmailSyntaxValidator. Confirms the rightmost DNS label is a currently-delegated IANA TLD (ACE xn--… and Unicode U-label forms both accepted) and rejects names reserved by the IETF Special-Use Domain Names registry:

    • .test (RFC 6761 §6.2)
    • .example, example.com, example.net, example.org (RFC 6761 §6.5)
    • .invalid (RFC 6761 §6.4)
    • .localhost (RFC 6761 §6.3)
    • .local (RFC 6762 — mDNS)
    • .onion (RFC 7686 — Tor)
    • .alt (RFC 9476)
    • home.arpa (RFC 8375)

    Subdomains under any of these are also rejected.

  • Sources/SwiftEmailValidator/Generated/IANATLDs.swift — bundled IANA TLD set (~1,400 ACE + ~150 U-label entries). Auto-generated; do not edit by hand.

  • Tools/generate_tlds.py — generator that fetches https://data.iana.org/TLD/tlds-alpha-by-domain.txt, expands ACE TLDs to U-labels, and writes the Swift source. Records source URL, fetch timestamp, and SHA-256.

  • .github/workflows/update-tlds.yml — nightly workflow that refreshes the bundled TLD list and opens a PR if it changed.

  • TLDDomainValidatorTests — new test class covering real TLDs, fake TLDs, special-use rejection, IDN handling, case insensitivity, trailing root dot, and wiring as the validator default.

Changed

  • Default domainValidator closure on EmailSyntaxValidator.correctlyFormatted and mailbox(from:) switched from { PublicSuffixList.isUnrestricted(PublicSuffixList.ace($0)) } to { TLDDomainValidator.isPubliclyDeliverable($0) }.
  • UTS39.domainValidator(_:base:) default base closure likewise switched from PSL to TLDDomainValidator.
  • EmailSyntaxValidator.correctlyFormatted(_:uts39:) and mailbox(from:uts39:) convenience overloads likewise switched.
  • README & benchmark output rewritten to describe the new default and the rationale for moving off the PSL.

Migration notes

  • Drop the dependency: remove SwiftPublicSuffixList from your Package.swift. SwiftEmailValidator no longer requires it.
  • @example.com / @example.net / @example.org now fail the default validator (RFC 6761 §6.5). If your tests or sample addresses used these, switch to a real public domain (@iana.org is stable) or pass a permissive domainValidator: { _ in true }.
  • @localhost, @host.local, intranet domains also fail the default. Pass a custom domainValidator closure if your application accepts these — see "Domain validation" in the README.
  • PSL-based custom rules: if you were calling PublicSuffixList.isUnrestricted($0, rules: customRules), replace with your own closure (the test suite has examples of a simple TLD-allowlist closure in LocalPartValidatorHookTests).
  • Newly-delegated TLDs: the bundled list ships frozen at the release SHA. The nightly GitHub workflow keeps the canonical copy current; downstream consumers waiting for a tagged release can override domainValidator with their own check or run python3 Tools/generate_tlds.py and ship the regenerated file.

See CHANGELOG.md for the full entry.

1.5.0 — UTS #39 Unicode Security Mechanisms

23 Apr 20:40

Choose a tag to compare

SwiftEmailValidatorUTS39 — opt-in companion target

New second library product layering UTS #39 on top of the core validator. Import only what you need:

import SwiftEmailValidator       // unchanged, zero new cost
import SwiftEmailValidatorUTS39  // adds ~280 KB of UCD data + checks

What the addon enforces

  • Identifier_Status filter — rejects Restricted scripts (Linear B, Runic, Deseret, etc.)
  • Mixed-script detection — Single Script / Highly Restrictive / Moderately Restrictive per §5.2
  • §4 confusable skeletons — skeleton-equality against caller-supplied protected forms (opt-in per call)

Usage

Convenience API — one call, both sides of the address checked:

let ok = EmailSyntaxValidator.correctlyFormatted(
    "alice@example.com",
    uts39: UTS39.Policy()  // default: Highly Restrictive
)

Lower-level — compose closures yourself:

EmailSyntaxValidator.correctlyFormatted(
    candidate,
    domainValidator: UTS39.domainValidator(policy),
    localPartValidator: UTS39.localPartValidator(policy)
)

Main-library change (non-breaking)

One new closure parameter on correctlyFormatted and mailbox(from:):

localPartValidator: (String) -> Bool = { _ in true }

It receives the semantic local-part string (dot-atom as-is, quoted-string unescaped). Default preserves existing behavior.

Implementation notes

  • All Unicode tables generated from UCD 17.0.0 via Sources/SwiftEmailValidatorUTS39/Tools/generate.py
  • Skeleton algorithm iterates map+NFD to a fixed point (13 of ~6500 confusables.txt entries require up to 3 iterations for idempotence)
  • Multi-scalar NFD sources (48 entries, e.g. U+01A1 → [U+006F U+031B]) handled via a longest-match prefix table
  • Restriction Level classification uses per-scalar Script_Extensions ∩ target ≠ ∅ per §5.1 (not union-based)

Test count

242 total (was 164). Bulk regression test walks every entry in the generated confusables table; restriction-level edge cases cover UTS #39 §5.2 examples and ICU itspoof.cpp patterns including Arabic-Indic digits, combining marks, and the Japanese/Korean/Chinese whitelist combos.

1.4.1 — accept RFC 4291 §2.2 format-2 IPv6 literals

23 Apr 19:14

Choose a tag to compare

Small patch release. Closes the single syntax-level gap surfaced by the
reverse-check added in 1.4.0's Benchmarks harness.

Fixed

  • IPv6 literal regex now accepts RFC 4291 §2.2 format 2. Six
    uncompressed hex groups followed by a trailing IPv4-in-dotted-decimal
    (e.g. aaaa:aaaa:aaaa:aaaa:aaaa:aaaa:127.0.0.1) are now recognised as
    valid. The upstream regex this validator was derived from only
    recognised the compressed / IPv4-mapped forms (::ffff:x.x.x.x,
    1::5:x.x.x.x). Email addresses such as
    valid.ipv6v4.addr@[IPv6:aaaa:aaaa:aaaa:aaaa:aaaa:aaaa:127.0.0.1]
    now validate as expected.

Maximum IPv6 literal length remains 45 octets, which is exactly the
existing IPAddressSyntaxValidator public-API length cap — no guard
changes needed.

Tests

Added testIPv6Format2UncompressedWithEmbeddedIPv4 (5 positive cases
including the length boundary) and testIPv6Format2RejectsWrongGroupCount
(5 negative cases for wrong hex-group count, wrong IPv4 octet count, and
out-of-range octets). Test count is 164.

Reverse benchmark

After this fix the --reverse benchmark mode reports 140/144 agreement
with competitor test assertions. The remaining four disagreements are all
syntax-vs-policy differences (127.0.0.1, 127.0.0.256, 127.0.0.1.26,
mailserver as unbracketed hostnames) — our syntax layer accepts them
because they are legal RFC 1035 labels, and the shipped default
domainValidator rejects them as domain policy.

Full changes: 1.4.0...1.4.1

1.4.0 — IP validator DoS hardening + benchmark harness

23 Apr 18:59

Choose a tag to compare

Additive minor release. No behavior change for users of EmailSyntaxValidator.

Added

  • IPAddressSyntaxValidator public length-capped wrappers. match(_:),
    matchIPv4(_:), and matchIPv6(_:) now apply a utf8.count guard
    (15 octets for IPv4, 45 for IPv6) before dispatching to the regex engine.
    Prior to 1.4.0 a caller invoking IPAddressSyntaxValidator directly with
    a multi-megabyte string would spend O(n) inside NSRegularExpression
    before the $ anchor failed — a denial-of-service vector. Internal raw
    matchers _match(_:) / _matchIPv4(_:) / _matchIPv6(_:) retain the
    pre-1.4.0 behavior for EmailSyntaxValidator's hot path, which is
    already bounded by the upstream 254 UTF-8 octet address cap.

  • Benchmarks/ SPM package. A standalone harness runs the 195-case
    DemoApp corpus through every SPM-consumable Swift email validator we
    could locate (evanrobertson, MimeEmailParser, bdolewski's regex,
    jwelton-equivalent via NSDataDetector) and emits a Markdown accuracy
    table. Published results are in the new "Comparison with other Swift
    email validators" section of the README. Kept out of the main
    Package.swift so consumers don't transitively pull competitor deps.

Security

The length-capped public wrappers close the only input-length DoS vector
found in a manual audit of the library's public API surface.
EmailSyntaxValidator users were never exposed. No crashes introduced;
correctlyFormatted(_:) behavior is unchanged.

Compatibility

  • Purely additive: the new public methods wrap previously-internal
    behavior. No existing API signature or return value changed.
  • Test suite: 162 passing (was 157).
  • Minimum Swift / platforms unchanged (Swift 5.5, macOS 10.12+, iOS 11+, tvOS 11+).

Full changes: 1.3.1...1.4.0

1.3.1 — SwiftPublicSuffixList 3.1.0 compatibility

23 Apr 05:47

Choose a tag to compare

Changed

  • SwiftPublicSuffixList dependency bumped to 3.1.0. v3.0 tightened PublicSuffixList.isUnrestricted(_:) / match(_:) to reject non-ASCII hostnames — IDN labels must be in ACE (Punycode) form. The default domainValidator closure now calls PublicSuffixList.ace(_:) on the domain before dispatching to isUnrestricted(_:), so Unicode IDN domains continue to validate exactly as they did on 1.3.0 with PSL 2.x.
  • Mailbox.Host.domain(...) still carries the original user-facing string; only the validator dispatch uses the ACE form.

Migration

Callers who pass a custom domainValidator closure to correctlyFormatted(_:) / mailbox(from:) and rely on the PSL default behavior via PublicSuffixList.isUnrestricted(_:) should wrap their call site with PublicSuffixList.ace(_:) if the closure receives Unicode IDN domains — e.g. { PublicSuffixList.isUnrestricted(PublicSuffixList.ace($0), rules: myRules) }.

1.3.0 — EmailNormalizer.nfc + RFC 6532 audit

23 Apr 02:02

Choose a tag to compare

Highlights

New API: EmailNormalizer.nfc(_:)

RFC 6532 §3.1-compliant, name-preserving NFC normalization. Use this when you need a spec-compliant comparison form, or when you are normalizing an address you intend to preserve for display, forwarding, or reply-to. NFC collapses canonically-equivalent scalar sequences (e.g. decomposed e + U+0301 → precomposed é) but does not fold compatibility variants (fullwidth, ligatures, superscripts).

EmailNormalizer.nfkc(_:) remains available for anti-spoofing / account de-duplication use cases (Gmail-style), as a documented deliberate deviation from RFC 6532 §3.1's "SHOULD NOT".

RFC compatibility audit

Conducted an in-depth audit of the normalizer against RFC 6532 §3.1, RFC 5198 §3, RFC 5891 §4.1 (IDNA2008), RFC 5321 §2.4, RFC 5321 §4.1.2 (quoted strings), and UAX #15. No spec violations found. nfc(_:) is fully compliant; nfkc(_:) deviations are documented in-source and pinned by tests.

Docstring improvements

  • nfkc(_:) IDNA2008 section now names UTS#46 transitional mode and RFC 3492 Punycode explicitly
  • Dot-folding example list extended (U+FF0E, U+3002, U+FF61)
  • New section enumerating compatibility folds that introduce ASCII SPACE (U+00A0, U+2003, U+FDFA)

New tests

  • testNfcIsIdempotent / testNfkcIsIdempotent — UAX #15 D8/D9 stability
  • testNfkcOutputIsAlsoInNfc — NFKC ⊇ NFC property

Test count: 157 (was 154 in 1.2.2), 0 failures.

No breaking changes

Pure additive release. Existing nfkc(_:) callers are unaffected.

1.2.2 — internal simplification

23 Apr 01:14

Choose a tag to compare

What changed

Follow-up to 1.2.1 — post-release adversarial re-review empirically verified that the 17-case inline BMP guard in extractQuotedString was fully redundant with the per-scalar check at the end of the loop. Collapsed. Also tightened the RFC 2047 candidate gate to reject supplementary-plane Tag/PUA/noncharacter scalars upfront (previously caught after encode + re-parse).

No behavior change for valid input. All 129 tests pass.

Details

  • extractQuotedString: removed the 17-case inline BMP contains block (U+00AD, U+00A0, U+1680, U+2000–U+200A, U+200B–U+200D, U+202F, U+205F, U+2060–U+2065, U+3000, U+FEFF, U+FE00–U+FE0F, U+2028–U+2029, U+FDD0–U+FDEF, U+FFFE, U+FFFF). All 26 scalars confirmed rejected by qtextUnicodeSMTPCharacterSet's per-scalar allSatisfy. The supplementary-plane guard (isRejectedSupplementaryScalar) stays in the loop — SSP scalars like U+E0100 can attach as grapheme extenders and must be rejected per-scalar, since they are unioned into the set and cannot be subtracted (Foundation .subtracting() bug on supplementary planes).
  • candidateForRfc2047: first gate now also rejects isRejectedSupplementaryScalar, bringing it in line with extractDotAtom / extractQuotedString.
  • Replaced a misleading comment that claimed allSatisfy "only examines the first scalar of each Character" (it iterates all scalars).

Upgrade

Drop-in patch. No API or behavior change for valid input.

Internal refactor + invariant tests

23 Apr 00:29

Choose a tag to compare

A patch release with no behavior change. Adversarial re-review of the post-1.2.0 design observations confirmed two items and retracted three; this release lands the confirmed work and adds clarifying comments around the retracted items so future reviewers reach the same conclusion faster.

Confirmed (implemented)

  • Extracted isRejectedSupplementaryScalar(_:) — the 6 supplementary-plane conditions genuinely shared between extractDotAtom and extractQuotedString are now in one place. The BMP block in extractQuotedString stays put — it is path-specific (per-grapheme-cluster iteration), not a duplicate of dot-atom's per-scalar check.
  • Added InvariantTests.swift with 4 deterministic sweeps:
    • correctlyFormatted ⇔ mailbox != nil agreement across historically buggy scalar ranges
    • dot-atom and quoted-string parity for supplementary-plane scalars
    • ASCII acceptance implies Unicode acceptance (subset invariant)
    • No crashes on short byte strings (8000 probes)
    • Runs in ~90ms with no random seed (reproducible CI)

Retracted (with anti-confusion comments)

  • Options enum: doc comment now explains the [Options] array shape is intentional forward-compatibility.
  • CharacterSet block: MARK section header documents that co-location with parsing is deliberate (the subtract-before-union ordering invariant is load-bearing and easy to violate if split).
  • extractQuotedString BMP guard: comment now states it is not a duplicate of extractDotAtom — different iteration model, different requirements.

Tests

  • 125 → 129 (all passing).