fix(server): always fetch unread feeds from offset 0 in AI mode#147
fix(server): always fetch unread feeds from offset 0 in AI mode#147
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug in the AI summarization feature where subsequent requests for unread feeds could use an incorrect offset, potentially leading to feeds being missed. By ensuring that the system always fetches unread feeds from the beginning (offset 0) in AI mode, it guarantees that all available unread content is processed, as previously summarized feeds are marked as read and no longer appear in subsequent fetches. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request intends to fix how unread feeds are fetched in AI mode. However, it introduces a critical security vulnerability: the summary generation process is susceptible to prompt injection, potentially leading to stored Cross-Site Scripting (XSS) due to unsanitized LLM-generated HTML output. Furthermore, the changes unconditionally apply AI mode behavior, causing a regression for non-AI users by breaking pagination and incorrectly marking feeds as read. It is crucial to sanitize LLM output and restore conditional logic for AI mode, along with adding a test case for non-AI mode in SummarizeServiceSpec.scala.
| .drain ++ summarizeStream( | ||
| strippedText, | ||
| validatedLanguage.displayName, | ||
| selectedModel.modelId | ||
| ) |
There was a problem hiding this comment.
The summarizeStream function at these lines is vulnerable to prompt injection, which can lead to stored Cross-Site Scripting (XSS) as the LLM-generated HTML output is not sanitized. It is crucial to sanitize this output using a library like Jsoup with a strict allow-list. Additionally, the change in this area unconditionally applies 'AI mode' behavior, causing a regression for non-AI users by breaking pagination and incorrectly marking feeds as read. The conditional logic based on user.settings.isAiMode must be restored.
.drain ++ summarizeStream(
strippedText,
validatedLanguage.displayName,
selectedModel.modelId
).map {
case ru.trett.rss.models.SummaryEvent.Content(text) =>
ru.trett.rss.models.SummaryEvent.Content(org.jsoup.Jsoup.clean(text, org.jsoup.safety.Safelist.relaxed()))
case event => event
}| test("streamSummary should always fetch feeds from offset 0 regardless of provided offset") { | ||
| val feedRepository = mock[FeedRepository] | ||
| val client = mock[Client[IO]] | ||
| val user = User("user-id", "User", "user@example.com", User.Settings()) | ||
|
|
||
| implicit val loggerFactory: LoggerFactory[IO] = NoOpFactory[IO] | ||
|
|
||
| // Mock getTotalUnreadCount | ||
| (feedRepository.getTotalUnreadCount) | ||
| .expects("user-id") | ||
| .returning(IO.pure(60)) | ||
|
|
||
| // This is the CRITICAL part: even if streamSummary is called with offset 30, | ||
| // it MUST call feedRepository.getUnreadFeeds with offset 0 because | ||
| // it's in AI mode and feeds from the previous batch were already marked as read. | ||
| (feedRepository | ||
| .getUnreadFeeds(_: User, _: Int, _: Int)) | ||
| .expects(user, 30, 0) // batchSize is 30, expected offset is 0 | ||
| .returning(IO.pure(List.empty)) | ||
|
|
||
| val service = new SummarizeService(feedRepository, client, "api-key") | ||
|
|
||
| // Call with offset 30 (simulating "Load More" click) | ||
| service.streamSummary(user, 30).compile.toList.unsafeRunSync() | ||
| } |
There was a problem hiding this comment.
This is a good test for the 'AI mode' behavior. To improve test coverage and prevent regressions like the one I mentioned in SummarizeService.scala, it would be beneficial to add another test case for when AI mode is disabled (isAiMode is false).
This test would verify that:
- The
offsetparameter passed tostreamSummaryis correctly passed through tofeedRepository.getUnreadFeeds. feedRepository.markFeedAsReadis not called.
This would ensure that the non-AI mode functionality remains correct and prevent future regressions.
No description provided.