Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -206,11 +206,7 @@ class SummarizeService(feedRepository: FeedRepository, client: Client[IO], apiKe
.getOrElse("")
}
.map { text =>
if (text.startsWith("```html")) {
text.stripPrefix("```html").stripSuffix("```").trim
} else {
text.trim
}
text.stripPrefix("```html").stripSuffix("```")
}
.filter(_.nonEmpty)
.map(SummaryEvent.Content(_))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The summarizeStream function generates summaries from LLM output, which is explicitly instructed to be in HTML format. The raw HTML output from the LLM is then encapsulated within a SummaryEvent.Content object and emitted. There is no server-side sanitization of this HTML content to prevent potential Cross-Site Scripting (XSS) vulnerabilities. Although the prompt instructs the LLM to "Never use <script> tags", relying solely on LLM instructions for security is insufficient. If the LLM were to generate malicious HTML (e.g., containing <script> tags or other XSS payloads), and this SummaryEvent.Content is rendered directly on the client-side without proper sanitization, it could lead to stored XSS.

Remediation: Implement robust HTML sanitization on the LLM's output before it is sent to the client. This should be done server-side using a well-maintained HTML sanitization library (e.g., OWASP Java HTML Sanitizer for Scala/Java applications). The sanitization should remove any potentially malicious tags, attributes, or JavaScript. The client-side rendering should also treat this content as untrusted and apply appropriate escaping or sanitization if it's not already guaranteed to be safe by the server.

Expand Down