fix(sdk): resize Anthropic many-image inputs#2552
fix(sdk): resize Anthropic many-image inputs#2552Zheng-Lu wants to merge 11 commits intoOpenHands:mainfrom
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands pls merge from main, resolve all conflicts. Then do /codereview-roasted /github-pr-review |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
xingyaoww
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable — Works, but the structure needs improvement
Linus's Three Questions:
- Is this solving a real problem? — Yes. Anthropic's many-image limit is a real production failure.
- Is there a simpler way? — Yes. This is ~80 lines of image manipulation code jammed into a 1500-line god-class. Extract it.
- What will this break? — Adding
pillowas a hard runtime dependency to the core SDK is the biggest concern. Every user now pays for PIL whether they use images or not.
VERDICT:
❌ Needs rework — The fix is directionally correct, but the dependency strategy and code placement need redesign before merging.
KEY INSIGHT:
The core problem is treating PIL as a hard SDK dependency and stuffing image-processing plumbing into the LLM class, when this should be a lazy-loaded utility module.
| def _apply_outgoing_image_resize( | ||
| self, messages: list[Message], *, vision_enabled: bool | ||
| ) -> None: | ||
| max_dimension = self._get_outgoing_image_max_dimension( | ||
| messages=messages, vision_enabled=vision_enabled | ||
| ) | ||
| if max_dimension is None: | ||
| return | ||
|
|
||
| for message in messages: | ||
| for content_item in message.content: | ||
| if isinstance(content_item, ImageContent): | ||
| content_item.image_urls = [ | ||
| self._resize_base64_data_image_url( | ||
| url, max_dimension=max_dimension | ||
| ) | ||
| for url in content_item.image_urls | ||
| ] | ||
|
|
||
| def _get_outgoing_image_max_dimension( | ||
| self, messages: list[Message], *, vision_enabled: bool | ||
| ) -> int | None: | ||
| if not vision_enabled or self._infer_litellm_provider() != "anthropic": | ||
| return None | ||
|
|
||
| total_images = sum( | ||
| len(content_item.image_urls) | ||
| for message in messages | ||
| for content_item in message.content | ||
| if isinstance(content_item, ImageContent) | ||
| ) | ||
| if total_images <= ANTHROPIC_MANY_IMAGE_THRESHOLD: | ||
| return None | ||
|
|
||
| return ANTHROPIC_MANY_IMAGE_MAX_DIMENSION | ||
|
|
||
| @staticmethod | ||
| def _resize_base64_data_image_url(url: str, *, max_dimension: int) -> str: | ||
| if not url.startswith("data:image/"): | ||
| return url | ||
|
|
||
| header, sep, encoded = url.partition(";base64,") | ||
| if not sep: | ||
| return url | ||
|
|
||
| mime_type = header.removeprefix("data:") | ||
|
|
||
| try: | ||
| raw_bytes = base64.b64decode(encoded) | ||
| with Image.open(io.BytesIO(raw_bytes)) as image: | ||
| if max(image.size) <= max_dimension: | ||
| return url | ||
|
|
||
| resized_image = image.copy() | ||
| resized_image.thumbnail( | ||
| (max_dimension, max_dimension), Image.Resampling.LANCZOS | ||
| ) | ||
| image_format = image.format or mime_type.split("/", 1)[1].upper() | ||
|
|
||
| if image_format == "JPG": | ||
| image_format = "JPEG" | ||
|
|
||
| if image_format == "JPEG" and resized_image.mode not in ("RGB", "L"): | ||
| resized_image = resized_image.convert("RGB") | ||
|
|
||
| buffer = io.BytesIO() | ||
| resized_image.save(buffer, format=image_format) | ||
| except Exception: | ||
| logger.warning( | ||
| "Failed to resize base64 data image for outgoing LLM request", | ||
| exc_info=True, | ||
| ) | ||
| return url | ||
|
|
||
| resized_encoded = base64.b64encode(buffer.getvalue()).decode("ascii") | ||
| return f"data:{mime_type};base64,{resized_encoded}" |
There was a problem hiding this comment.
🟠 Important — 80 lines of image manipulation don't belong in LLM
llm.py is already 1500+ lines. These three methods (_apply_outgoing_image_resize, _get_outgoing_image_max_dimension, _resize_base64_data_image_url) are pure image-processing utilities with zero dependency on self state (one is already a @staticmethod, the other two only call _infer_litellm_provider()).
Extract to a standalone module, e.g. openhands/sdk/llm/utils/image_resize.py:
def resize_base64_data_url(url: str, *, max_dimension: int) -> str: ...
def maybe_resize_images(messages, provider, vision_enabled): ...Then the LLM method becomes a one-liner call. Keep the god-class from getting godlier.
There was a problem hiding this comment.
HUMAN: ^agree with the judgement here
There was a problem hiding this comment.
Yes, fixed in last commit.
I have moved the image-resize logic out of llm.py into openhands/sdk/llm/utils/image_resize.py, including the dimension selection and base64 resize helper.
There was a problem hiding this comment.
@Zheng-Lu did you push it up to this PR? I didn't see the changes here..
There was a problem hiding this comment.
@xingyaoww Sorry, I thought I did but I actually didn't, now the change is committed
|
OpenHands encountered an error: Request timeout after 30 seconds to https://ypvqnvwqmvbumzax.prod-runtime.all-hands.dev/api/conversations/9650ebc3-b326-46ce-88ee-b683ce63e259/ask_agent See the conversation for more information. |
Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands do /codereview-roasted /github-pr-review |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
xingyaoww
left a comment
There was a problem hiding this comment.
🟡 Acceptable — Core logic is correct and the tests are solid, but there are design issues worth addressing before merge.
Linus-Style Analysis:
The fundamental idea is sound: intercept oversized base64 images before they hit Anthropic's API limits. The Anthropic docs research is correct (20-image threshold, 2000px vs 8000px caps), and the resize logic itself is clean.
But the plumbing around it — threading a PIL module as Any through three functions, silent in-place mutation — adds unnecessary complexity that a simpler design would eliminate entirely.
This review was generated by an AI agent (OpenHands).
SummaryThe request was to perform a Checklist
Key Findings Posted
No extraneous changes were made — this was purely a review action with no code modifications. |
|
[Automatic Post]: This PR seems to be currently waiting for review. @xingyaoww @Zheng-Lu @openhands-ai[bot], could you please take a look when you have a chance? |
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
#2467
Summary
Reproduces and fixes the Anthropic many-image failure by resizing oversized base64 images during LLM message formatting.
What Changed
LLM.format_messages_for_llmpillowas a runtime dependency for in-memory image resizingValidation
pytest tests/sdk/llm/test_llm_image_resizing.pyPassedpytest tests/sdk/llm/test_llm_image_resizing.py tests/sdk/llm/test_vision_support.pyPassedruff check openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/llm/test_llm_image_resizing.pyPassedpyright openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/llm/test_llm_image_resizing.pyPassedProof
Now the multiple-images request with at least one image > 2000px doesn't throw the error litellm.BadRequestError
