Skip to content

fix(sdk): resize Anthropic many-image inputs#2552

Open
Zheng-Lu wants to merge 11 commits intoOpenHands:mainfrom
Zheng-Lu:fix/2467-image-downscale
Open

fix(sdk): resize Anthropic many-image inputs#2552
Zheng-Lu wants to merge 11 commits intoOpenHands:mainfrom
Zheng-Lu:fix/2467-image-downscale

Conversation

@Zheng-Lu
Copy link
Copy Markdown

@Zheng-Lu Zheng-Lu commented Mar 23, 2026

#2467

Summary

Reproduces and fixes the Anthropic many-image failure by resizing oversized base64 images during LLM message formatting.

What Changed

  • Added an Anthropic-only resize path in LLM.format_messages_for_llm
  • Resize only triggers when the outgoing request crosses the many-image threshold
  • Preserves aspect ratio and leaves URL images unchanged
  • Added pillow as a runtime dependency for in-memory image resizing

Validation

  • pytest tests/sdk/llm/test_llm_image_resizing.py Passed
  • pytest tests/sdk/llm/test_llm_image_resizing.py tests/sdk/llm/test_vision_support.py Passed
  • ruff check openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/llm/test_llm_image_resizing.py Passed
  • pyright openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/llm/test_llm_image_resizing.py Passed

Proof

Now the multiple-images request with at least one image > 2000px doesn't throw the error litellm.BadRequestError
image

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww
Copy link
Copy Markdown
Collaborator

@OpenHands pls merge from main, resolve all conflicts. Then do /codereview-roasted /github-pr-review

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Mar 27, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable — Works, but the structure needs improvement

Linus's Three Questions:

  1. Is this solving a real problem? — Yes. Anthropic's many-image limit is a real production failure.
  2. Is there a simpler way? — Yes. This is ~80 lines of image manipulation code jammed into a 1500-line god-class. Extract it.
  3. What will this break? — Adding pillow as a hard runtime dependency to the core SDK is the biggest concern. Every user now pays for PIL whether they use images or not.

VERDICT:
Needs rework — The fix is directionally correct, but the dependency strategy and code placement need redesign before merging.

KEY INSIGHT:
The core problem is treating PIL as a hard SDK dependency and stuffing image-processing plumbing into the LLM class, when this should be a lazy-loaded utility module.

Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated
Comment thread openhands-sdk/pyproject.toml
Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated
Comment on lines +1268 to +1343
def _apply_outgoing_image_resize(
self, messages: list[Message], *, vision_enabled: bool
) -> None:
max_dimension = self._get_outgoing_image_max_dimension(
messages=messages, vision_enabled=vision_enabled
)
if max_dimension is None:
return

for message in messages:
for content_item in message.content:
if isinstance(content_item, ImageContent):
content_item.image_urls = [
self._resize_base64_data_image_url(
url, max_dimension=max_dimension
)
for url in content_item.image_urls
]

def _get_outgoing_image_max_dimension(
self, messages: list[Message], *, vision_enabled: bool
) -> int | None:
if not vision_enabled or self._infer_litellm_provider() != "anthropic":
return None

total_images = sum(
len(content_item.image_urls)
for message in messages
for content_item in message.content
if isinstance(content_item, ImageContent)
)
if total_images <= ANTHROPIC_MANY_IMAGE_THRESHOLD:
return None

return ANTHROPIC_MANY_IMAGE_MAX_DIMENSION

@staticmethod
def _resize_base64_data_image_url(url: str, *, max_dimension: int) -> str:
if not url.startswith("data:image/"):
return url

header, sep, encoded = url.partition(";base64,")
if not sep:
return url

mime_type = header.removeprefix("data:")

try:
raw_bytes = base64.b64decode(encoded)
with Image.open(io.BytesIO(raw_bytes)) as image:
if max(image.size) <= max_dimension:
return url

resized_image = image.copy()
resized_image.thumbnail(
(max_dimension, max_dimension), Image.Resampling.LANCZOS
)
image_format = image.format or mime_type.split("/", 1)[1].upper()

if image_format == "JPG":
image_format = "JPEG"

if image_format == "JPEG" and resized_image.mode not in ("RGB", "L"):
resized_image = resized_image.convert("RGB")

buffer = io.BytesIO()
resized_image.save(buffer, format=image_format)
except Exception:
logger.warning(
"Failed to resize base64 data image for outgoing LLM request",
exc_info=True,
)
return url

resized_encoded = base64.b64encode(buffer.getvalue()).decode("ascii")
return f"data:{mime_type};base64,{resized_encoded}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important — 80 lines of image manipulation don't belong in LLM

llm.py is already 1500+ lines. These three methods (_apply_outgoing_image_resize, _get_outgoing_image_max_dimension, _resize_base64_data_image_url) are pure image-processing utilities with zero dependency on self state (one is already a @staticmethod, the other two only call _infer_litellm_provider()).

Extract to a standalone module, e.g. openhands/sdk/llm/utils/image_resize.py:

def resize_base64_data_url(url: str, *, max_dimension: int) -> str: ...
def maybe_resize_images(messages, provider, vision_enabled): ...

Then the LLM method becomes a one-liner call. Keep the god-class from getting godlier.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUMAN: ^agree with the judgement here

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheng-Lu is this fixed in any commit? 👀

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed in last commit.

I have moved the image-resize logic out of llm.py into openhands/sdk/llm/utils/image_resize.py, including the dimension selection and base64 resize helper.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheng-Lu did you push it up to this PR? I didn't see the changes here..

Copy link
Copy Markdown
Author

@Zheng-Lu Zheng-Lu Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingyaoww Sorry, I thought I did but I actually didn't, now the change is committed

Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated
Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated
Comment thread openhands-sdk/openhands/sdk/llm/llm.py Outdated
Comment thread tests/sdk/llm/test_llm_image_resizing.py
@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Mar 27, 2026

OpenHands encountered an error: Request timeout after 30 seconds to https://ypvqnvwqmvbumzax.prod-runtime.all-hands.dev/api/conversations/9650ebc3-b326-46ce-88ee-b683ce63e259/ask_agent

See the conversation for more information.

@Zheng-Lu Zheng-Lu requested a review from xingyaoww April 6, 2026 22:31
@xingyaoww
Copy link
Copy Markdown
Collaborator

@OpenHands do /codereview-roasted /github-pr-review

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Apr 10, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link
Copy Markdown
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable — Core logic is correct and the tests are solid, but there are design issues worth addressing before merge.

Linus-Style Analysis:

The fundamental idea is sound: intercept oversized base64 images before they hit Anthropic's API limits. The Anthropic docs research is correct (20-image threshold, 2000px vs 8000px caps), and the resize logic itself is clean.

But the plumbing around it — threading a PIL module as Any through three functions, silent in-place mutation — adds unnecessary complexity that a simpler design would eliminate entirely.

⚠️ Eval-risk note: This PR changes image processing in the LLM message pipeline, which could affect agent behavior on vision benchmarks. Flagging for a human maintainer to decide after running lightweight evals.

This review was generated by an AI agent (OpenHands).

Comment thread openhands-sdk/openhands/sdk/llm/utils/image_resize.py Outdated
Comment thread openhands-sdk/openhands/sdk/llm/utils/image_resize.py Outdated
Comment thread openhands-sdk/openhands/sdk/llm/utils/image_resize.py
Comment thread openhands-sdk/pyproject.toml Outdated
Comment thread openhands-sdk/openhands/sdk/llm/utils/image_resize.py Outdated
Comment thread tests/sdk/llm/test_llm_image_resizing.py
@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Apr 10, 2026

Summary

The request was to perform a /codereview-roasted and /github-pr-review on PR #2552 ("fix(sdk): resize Anthropic many-image inputs"). Here's what was done:

Checklist

  • Retrieved PR diff and context — Fetched the full diff (5 files changed), PR description, and linked issue SDK should auto-resize oversized images before sending to LLM providers #2467
  • Performed roasted code review — Analyzed the code through the Linus Torvalds-style lens focusing on data structures, complexity, pragmatism, and breaking changes
  • Posted structured GitHub PR review — Submitted a single COMMENT review with 6 inline comments using priority labels (🟠🟠🟡🟡🟢🟢) via the GitHub API
  • Used correct review event — Left as COMMENT (not APPROVE) because the PR changes image processing in the LLM message pipeline, which falls under the eval-risk policy requiring human maintainer sign-off
  • No code modifications made — Review-only, as required by the codereview skill

Key Findings Posted

  1. 🟠 image_module: Any threading — Unnecessary complexity; Python's sys.modules cache makes repeated imports free
  2. 🟠 Silent in-place mutationmaybe_resize_messages_for_provider mutates inputs with no return value, relying on an invisible deepcopy contract
  3. 🟡 pillow>=12.1.1 floor too high — The APIs used are stable since Pillow 9.1+
  4. 🟡 Leaky public APIresize_base64_data_url exposes an Any-typed PIL parameter
  5. 🟢 Good dimension logic — Clean early returns, correct Anthropic doc mirroring
  6. 🟢 Solid tests — Real image creation and dimension assertions, not mock-only

No extraneous changes were made — this was purely a review action with no code modifications.

@all-hands-bot
Copy link
Copy Markdown
Collaborator

[Automatic Post]: This PR seems to be currently waiting for review. @xingyaoww @Zheng-Lu @openhands-ai[bot], could you please take a look when you have a chance?

Comment thread openhands-sdk/openhands/sdk/llm/utils/image_resize.py Outdated
@Zheng-Lu Zheng-Lu requested a review from xingyaoww April 19, 2026 00:00
Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants