Strip Unicode Cf characters in PrintableString#4593
Strip Unicode Cf characters in PrintableString#4593tnull wants to merge 2 commits intolightningdevkit:mainfrom
Cf characters in PrintableString#4593Conversation
`PrintableString` is the sanitiser LDK uses to render untrusted strings
(node aliases, BOLT-12 invoice / offer text, `UntrustedString`, LSPS
messages, `lightning-invoice` descriptions) to logs and UI. It only
replaced `char::is_control` matches (Unicode general category `Cc`)
with U+FFFD, leaving the entire `Cf` (Format) category untouched.
That is the exact category covering the bidirectional override /
isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width
characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack
family (CVE-2021-42574): a peer can set its alias / invoice description
/ offer fields to e.g. `safe\u{202E}cipsxe.exe`, which previously
passed through verbatim while a human reader sees `safeexe.cips` —
defeating the threat model `PrintableString` exists to defend against.
Replace `Cf` codepoints alongside `Cc` ones. The `Cf` ranges are
inlined as a `matches!` table sourced from Unicode 16.0 to keep the
change `no_std`-friendly with no new dependencies.
Co-Authored-By: HAL 9000
|
👋 Thanks for assigning @valentinewallace as a reviewer! |
|
No issues found. The previously flagged off-by-one in the Egyptian Hieroglyph |
| use core::fmt::Write; | ||
| for c in self.0.chars() { | ||
| let c = if c.is_control() { core::char::REPLACEMENT_CHARACTER } else { c }; | ||
| let c = if c.is_control() || is_format_char(c) { |
There was a problem hiding this comment.
Could it make sense to simplify this to just use the replacement character if !is_alphanumeric && !is_whitespace, or similar?
There was a problem hiding this comment.
I think this would be very restrictive? We would lose the capability to set our NodeAlias to "I💖LDK! ⚡" :(
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4593 +/- ##
==========================================
- Coverage 86.84% 86.18% -0.66%
==========================================
Files 161 156 -5
Lines 109260 108660 -600
Branches 109260 108660 -600
==========================================
- Hits 94882 93646 -1236
- Misses 11797 12405 +608
- Partials 2581 2609 +28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PrintableStringis the sanitiser LDK uses to render untrusted strings (node aliases, BOLT-12 invoice / offer text,UntrustedString, LSPS messages,lightning-invoicedescriptions) to logs and UI. It only replacedchar::is_controlmatches (Unicode general categoryCc) with U+FFFD, leaving the entireCf(Format) category untouched.That is the exact category covering the bidirectional override / isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack family (CVE-2021-42574): a peer can set its alias / invoice description / offer fields to e.g.
safe\u{202E}cipsxe.exe, which previously passed through verbatim while a human reader seessafeexe.cips— defeating the threat modelPrintableStringexists to defend against.Replace
Cfcodepoints alongsideCcones. TheCfranges are inlined as amatches!table sourced from Unicode 16.0 to keep the changeno_std-friendly with no new dependencies.Co-Authored-By: HAL 9000