Skip to content

bug: Batch API result file download fails for large outputs (>200MB) with ConnectionResetError #2959

@xwang049

Description

@xwang049

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

Describe the bug
When using the Batch API to download large result files (specifically when the output .jsonl exceeds ~200-300MB, e.g., 50k embedding rows), the download connection is prematurely closed by the peer.

The error typically manifests as:
httpcore.RemoteProtocolError: peer closed connection without sending complete message body (received X bytes, expected Y bytes)

To Reproduce
Create a Batch job with 50,000 embedding requests (text-embedding-3-small).

Wait for the batch to complete (status: completed).

Attempt to download the result file using the SDK:

Python

Standard SDK approach that fails

content = client.batches.retrieve_content(batch.output_file_id)
content.with_streaming_response.content()
Or using the requests library directly with the file URL.

The download will consistently fail after receiving a few hundred megabytes.

Expected behavior
The SDK should handle large file streaming robustly, or provide a built-in chunked download/retry mechanism for massive Batch outputs (1GB+).

Environment
OS: macOS 15.x

Python Version: 3.12.x

Additional context
Workaround: I've confirmed that splitting the 50k requests into smaller batches of 10k (resulting in ~200MB files) allows for stable downloads. This suggests a potential timeout or buffer limitation on the server-side proxy or within the SDK's streaming implementation for payloads exceeding a certain threshold.

I am currently implementing a manual chunked-download helper with exponential backoff to bypass this. but i dont think it;s optimal

To Reproduce

To Reproduce:

  1. Prepare a batch with ~50,000 rows (near the OpenAI limit)
  2. Submit to Batch API with endpoint /v1/embeddings
  3. Wait for status = completed
  4. Attempt to download result file via Files API Code Snippets:

sample code:

from openai import OpenAI
client = OpenAI()

Result file is ~1GB when batch has 50,000 embeddings

result_content = client.files.content(result_file_id)
result_text = result_content.text # fails silently or raises

#failed on streaming :
with client.files.with_streaming_response.content(result_file_id) as resp:
with open("result.jsonl", "wb") as f:
for chunk in resp.iter_bytes(chunk_size=1024 * 1024):
f.write(chunk)

raises: peer closed connection without sending complete message body

(received 940216419 bytes, expected 1056098116)

#same for requests:
import requests
response = requests.get(
f"https://api.openai.com/v1/files/{result_file_id}/content",
headers={"Authorization": f"Bearer {api_key}"},
stream=True,
timeout=(10, 300)
)

same error, connection drops around 500-1000MB

Code snippets

OS

macOS

Python version

Python Version: 3.12.x

Library version

2.24.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions