fix(sanitize): preserve angle brackets inside code blocks and inline code by blackwell-systems · Pull Request #2408 · github/github-mcp-server

blackwell-systems · 2026-04-30T02:39:14Z

Description

bluemonday.StrictPolicy() treats angle brackets inside markdown code blocks and inline code spans as HTML tags and strips them. This causes content like mut_raw_ptr<int> in issue/PR bodies to become mut_raw_ptr when read through MCP endpoints.

Before

let ptr: mut_raw_ptr<int> = raw_new int;

Agent sees: let ptr: mut_raw_ptr = raw_new int;

After

Agent sees: let ptr: mut_raw_ptr<int> = raw_new int;

Root cause

FilterHTMLTags calls bluemonday.Sanitize() on the entire markdown body without distinguishing code from prose. Bluemonday treats <int>, <T>, <String>, etc. as unrecognized HTML tags and removes them.

Fix

Before HTML sanitization, replace < and > inside fenced code blocks (```) and inline code spans (`) with null-byte sentinels that bluemonday will not interpret as HTML. After sanitization, restore the sentinels to angle brackets.

This preserves XSS protection for angle brackets in prose (e.g. <script> is still stripped) while keeping angle brackets inside code intact.

Testing

Added 6 test cases covering:

Fenced code blocks with angle brackets (the reported bug)
Inline code with generic types (Vec<String>)
Angle brackets outside code still sanitized (XSS protection preserved)
Multiple inline code spans
Fenced code with language info string

Fixes #2202

…code bluemonday's StrictPolicy treats angle brackets inside markdown code blocks and inline code spans as HTML tags and strips them. This causes content like `mut_raw_ptr<int>` to become `mut_raw_ptr` when read through MCP issue/PR endpoints. The fix protects angle brackets inside fenced code blocks (```) and inline code spans (`) with sentinels before HTML sanitization, then restores them after. Angle brackets outside code are still sanitized normally, preserving XSS protection. Fixes github#2202 Signed-off-by: Dayna Blackwell <dayna@blackwell-systems.com>

jluocsa

Thanks for picking up #2202 — generic-type angle brackets disappearing from code blocks is a long-standing irritation. The CommonMark-aware fence/inline detection is much better than a regex-based approximation.

Before merge, I think there's a security-sensitive issue worth addressing:

Sentinel collision can be used to bypass the HTML sanitizer.

The current sentinels are:

const (
    ltSentinel = "\x00LT\x00"
    gtSentinel = "\x00GT\x00"
)

Sanitize runs in this order:

cleaned   := FilterCodeFenceMetadata(FilterInvisibleCharacters(input))
protected := protectCodeAngleBrackets(cleaned)
sanitized := FilterHTMLTags(protected)
return restoreCodeAngleBrackets(sanitized)

FilterInvisibleCharacters strips a specific list of Unicode codepoints (see shouldRemoveRune in pkg/sanitize/sanitize.go — zero-width spaces, BiDi controls, tag characters, etc.) but does not strip \x00 (NUL is not in the list).

That means an attacker who controls input — e.g. an issue title, PR body, search-result snippet, etc. — can submit:

\x00LT\x00script\x00GT\x00alert(1)\x00LT\x00/script\x00GT\x00

…outside any code block. The flow is:

FilterInvisibleCharacters → passes through (NUL not stripped)
FilterCodeFenceMetadata → passes through (no fence)
protectCodeAngleBrackets → passes through (no backticks, the sentinel literal is treated as ordinary text)
FilterHTMLTags (bluemonday) → passes through (\x00LT\x00script... is not HTML)
restoreCodeAngleBrackets → substitutes back into <script>alert(1)</script>

The output now contains raw HTML/JS that should have been stripped, defeating the purpose of FilterHTMLTags.

A few mitigation options, in increasing order of robustness:

Add \x00 to shouldRemoveRune. Cheapest fix; consistent with the existing "strip control / invisible chars" policy. Adding it before protectCodeAngleBrackets guarantees no pre-existing sentinels.
Pre-scan for the literal sentinel and refuse / escape it. A bit more code, no change to filter semantics.
Use a per-call randomized sentinel (generate two crypto-random byte sequences once per Sanitize invocation). Defeats collision by construction; only downside is a small allocation per call.

Option 1 seems most consistent with what FilterInvisibleCharacters is already doing, and the smallest possible change. A unit test mirroring TestSanitizePreservesAngleBracketsInCodeBlocks with input containing the literal sentinel would lock the behavior in.

Minor: closing-fence detection. protectCodeAngleBrackets treats any run of >= fenceLen backticks as a closing fence (anywhere on the line). CommonMark requires the closing fence to be on its own line and not have an info string. The current implementation is more permissive, which in practice means a stray ``` inside what was meant to be the fenced content will end protection early. This is a soft-fail (some < get stripped that shouldn't be) rather than a security issue, so probably acceptable, but worth a comment near the loop documenting the deviation.

Tests are nicely focused, the inline-vs-fenced split is handled, and [T comparable]-style generics outside code (where they really might be HTML-ish) are still cleaned. Once the sentinel exposure is closed, this is a great fix.

Security fix: Add NUL byte (0x00) to shouldRemoveRune so that FilterInvisibleCharacters strips NUL bytes before protectCodeAngleBrackets runs. Without this, an attacker can inject literal sentinel strings (\x00LT\x00script\x00GT\x00) that bypass FilterHTMLTags and get restored to <script> by restoreCodeAngleBrackets. Also add a comment documenting the CommonMark closing-fence deviation: the implementation treats any run of >= fenceLen backticks as a closing fence even mid-line, which is more permissive than CommonMark (requires own line, no info string). This is a soft-fail (some angle brackets may be unprotected) rather than a security issue. Tests added: - sentinel collision: verifies NUL-byte injection does not produce <script> - NUL bytes in code blocks: verifies code content is preserved after stripping - NUL byte in shouldRemoveRune: verifies 0x00 is in the removal set

blackwell-systems requested a review from a team as a code owner April 30, 2026 02:39

blackwell-systems force-pushed the fix-angle-brackets-in-code-blocks branch from 3722fe4 to 680c63b Compare April 30, 2026 02:41

jluocsa reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sanitize): preserve angle brackets inside code blocks and inline code#2408

fix(sanitize): preserve angle brackets inside code blocks and inline code#2408
blackwell-systems wants to merge 2 commits into
github:mainfrom
blackwell-systems:fix-angle-brackets-in-code-blocks

blackwell-systems commented Apr 30, 2026

Uh oh!

jluocsa left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

blackwell-systems commented Apr 30, 2026

Description

Before

After

Root cause

Fix

Testing

Uh oh!

jluocsa left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants