feat(sdk): implement LargeFileSurgicalCondenser for optimized context managementFeature/surgical condenser#2564
Conversation
Remove editable dependency for litellm.
| ) | ||
| summary = f"[Condensation]Viewed {file_info}" | ||
|
|
||
| return Condensation( |
There was a problem hiding this comment.
Unfortunately I don't think this works the way you might expect. If we see the pattern:
<prefix>
Action event: retrieve large file
Observation event: the large file
<suffix>
You probably want the resulting sequence to look like:
<prefix>
Action event: retrieve large file
Condensation: summary of large file
<suffix>
But LLM APIs expect every action event to have a matching observation event and will throw an exception if that isn't the case. We prevent these exceptions by filtering out unmatched actions and observations when constructing the View, so the actual resulting sequence is probably something like:
<prefix>
Condensation: summary of large file
<suffix>
There was a problem hiding this comment.
Actually I was trying to implement something like this:
<prefix>
Action event: retrieve large file
Observation event: [Condensation] summary of large file
<suffix>
Here actual observation event is replaced with condensed observation with same event id.
Rationale here is to make minimal changes to the original event list while generating new view.
So it felt unnecessary to condense the preceding action event.
(I've updated the code to return observation event instead of condensation event)
csmith49
left a comment
There was a problem hiding this comment.
I like the idea! Reminds me of the condensers we had in v0 (like this one to limit the data produced by the old browser tool). There are two problems/concerns I'd like to see addressed before approval:
Condenser Robustness
As implemented, I believe this approach is modifying more of the event history than is expected. I left some comments on the changes highlighting my concerns.
My recommendation is to tweak the LargeFileSurgicalCondenser to use the CondenserBase interface, which will simplify things:
class LargeFileSurgicalCondenser(CondenserBase):
def condense(
self,
view: View,
agent_llm: LLM | None = None
) -> View | Condensation:
# The view is a list of events we want to show the LLM
# Instead of trying to replace existing events with a
# Condensation, just do the surgery on the events
# directly while constructing a new viewNo need for condensation_requirement or get_condensation, you can just always return a modified View to get the behavior you want. Doing surgery directly on the events in the view will help prevent the action/observation mismatch I noted.
Looping
When we had a similar masking behavior for browser outputs, the agents would often get stuck in a loop. They'd find a page they want, follow a few links to build context, and by the time they were ready to act the first page had been masked. But attempting to reload that context would mask the second page, and reloading that would mask the third, and by the time everything was reloaded the first would be masked again.
We tried a lot of things to fix the looping, but nothing really did the trick (which is part of the reason why the browser output masking condenser wasn't ported from v0 to v1).
This implementation only keeps large files around for a single agent step, and that's probably fine for a single image. Have you noticed any looping with multiple images? Or large text files linked together?
| :param target_tool: Only condense observations from this tool | ||
| (default: 'file_editor'). | ||
| :param threshold_bytes: For TextContent, the byte size threshold to | ||
| trigger condensation (default: 10KB). |
There was a problem hiding this comment.
Minor nit: this repo uses Google style doc-strings, not reST
|
[Automatic Post]: It has been a while since there was any activity on this PR. @vivekvjnk, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
Hi All, |
- Instead of returning CondensationEvent, creates an observation event with same event id and replace original large observation event - Updated tests
This reverts commit eeed57d.
|
[Automatic Post]: It has been a while since there was any activity on this PR. @vivekvjnk, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
please review |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @vivekvjnk, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
1 similar comment
|
[Automatic Post]: It has been a while since there was any activity on this PR. @vivekvjnk, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
Summary
This PR introduces the LargeFileSurgicalCondenser, a specialized context management tool designed to mitigate context window bloat caused by large tool outputs (specifically images and large file reads from the file_editor tool).
Unlike standard windowing or summarization condensers, this "surgical" condenser follows a Post-Inference Cleanup strategy. It preserves raw, high-fidelity data (like base64 images) during the turn it is first viewed to ensure the agent has the necessary information for analysis. Crucially, it only replaces that data with a concise summary after the agent has responded, ensuring subsequent turns benefit from a lean context and increased KV cache efficiency without sacrificing the quality of the initial analysis.
Key Features
file_editortool (configurable).[Condensation] Viewed sample_image.png).TextContent.Changes
openhands/sdk/context/condenser/.examples/06_custom_examples/vision_agent/demonstrating the condenser in a multi-turn conversation.Performance Impact
By surgically removing prefix-heavy data like base64 images after their first use, we significantly increase prefix-matching for KV caches in multi-turn interactions, reducing latency and token costs for long-running sessions.
Checklist