Skip to content

fix(server): preserve spaces in AI summary stream by removing .trim#149

Merged
trett merged 1 commit intomainfrom
fix-spaces
Mar 10, 2026
Merged

fix(server): preserve spaces in AI summary stream by removing .trim#149
trett merged 1 commit intomainfrom
fix-spaces

Conversation

@trett
Copy link
Owner

@trett trett commented Mar 10, 2026

This PR fixes the issue where spaces were missing between words in the AI summary stream. The issue was caused by calling .trim on each individual chunk received from the Gemini API stream, leading to merged words when chunks were concatenated on the client.

Key changes:

  • Removed .trim() from the summary chunks in SummarizeService.scala.
  • Switched to .stripPrefix and .stripSuffix to remove markdown wrappers without affecting text content.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where AI-generated summaries, streamed from the Gemini API, were losing spaces between words. The problem stemmed from the .trim() method being applied to individual data chunks, leading to words merging when concatenated. The fix involves removing the problematic .trim() call and refining the markdown stripping logic to correctly preserve text formatting and spacing, thereby improving the readability and accuracy of the AI summaries.

Highlights

  • AI Summary Stream: Removed .trim() from the summary chunks in SummarizeService.scala to prevent spaces from being lost.
  • Markdown Handling: Switched to .stripPrefix and .stripSuffix for removing markdown wrappers, ensuring text content remains unaffected.
Changelog
  • server/src/main/scala/ru/trett/rss/server/services/SummarizeService.scala
    • Removed the .trim() method call from the text processing pipeline for AI summary chunks.
    • Consolidated the logic for stripping markdown prefixes and suffixes using .stripPrefix and .stripSuffix.
Activity
  • No activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request simplifies the processing of LLM-generated text by removing a conditional trim operation and consistently stripping "html" and "" prefixes/suffixes. A critical security concern was raised regarding the summarizeStream function, which emits unsanitized HTML content from the LLM. This poses a high-severity Cross-Site Scripting (XSS) vulnerability, as malicious HTML from the LLM could lead to stored XSS if rendered directly by the client. Server-side HTML sanitization is recommended to mitigate this risk.

text.stripPrefix("```html").stripSuffix("```")
}
.filter(_.nonEmpty)
.map(SummaryEvent.Content(_))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The summarizeStream function generates summaries from LLM output, which is explicitly instructed to be in HTML format. The raw HTML output from the LLM is then encapsulated within a SummaryEvent.Content object and emitted. There is no server-side sanitization of this HTML content to prevent potential Cross-Site Scripting (XSS) vulnerabilities. Although the prompt instructs the LLM to "Never use <script> tags", relying solely on LLM instructions for security is insufficient. If the LLM were to generate malicious HTML (e.g., containing <script> tags or other XSS payloads), and this SummaryEvent.Content is rendered directly on the client-side without proper sanitization, it could lead to stored XSS.

Remediation: Implement robust HTML sanitization on the LLM's output before it is sent to the client. This should be done server-side using a well-maintained HTML sanitization library (e.g., OWASP Java HTML Sanitizer for Scala/Java applications). The sanitization should remove any potentially malicious tags, attributes, or JavaScript. The client-side rendering should also treat this content as untrusted and apply appropriate escaping or sanitization if it's not already guaranteed to be safe by the server.

@trett trett merged commit 4db25af into main Mar 10, 2026
1 check passed
@trett trett deleted the fix-spaces branch March 10, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant