fix: add explicit encoding="utf-8" to .txt read in _validators.py by Ghraven · Pull Request #3288 · openai/openai-python

Ghraven · 2026-05-20T18:49:58Z

What

src/openai/lib/_validators.py reads a user-supplied .txt fine-tuning file in text mode without an explicit encoding:

with open(fname, "r") as f:
    content = f.read()

In text mode, open() uses the platform default encoding (locale.getpreferredencoding()). On Windows that is typically cp1252, not UTF-8.

Why it matters

.txt datasets frequently contain non-ASCII characters (smart quotes, accented text, CJK, emoji). On a default-cp1252 platform, reading a valid UTF-8 file raises UnicodeDecodeError or silently corrupts characters.

Before / After

# Before - platform-dependent
with open(fname, "r") as f:
# After - consistent across platforms
with open(fname, "r", encoding="utf-8") as f:

How I verified

Reproduced the failure on a simulated cp1252 default: the explicit utf-8 read preserves content, while a cp1252 read of the same valid UTF-8 file raises UnicodeDecodeError. With encoding="utf-8" the read succeeds regardless of platform. No behavior change on systems that already default to UTF-8.

Scope

Single-line change in src/openai/lib/, a hand-maintained (non-generated) path per CONTRIBUTING. Happy to adjust if you would prefer a different approach.

The text-mode open() for user-supplied .txt fine-tuning files relied on the platform default encoding (cp1252 on Windows), which raises UnicodeDecodeError or corrupts non-ASCII content on valid UTF-8 files. Pinning encoding="utf-8" makes the read consistent across platforms.

Ghraven requested a review from a team as a code owner May 20, 2026 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add explicit encoding="utf-8" to .txt read in _validators.py#3288

fix: add explicit encoding="utf-8" to .txt read in _validators.py#3288
Ghraven wants to merge 1 commit into
openai:mainfrom
Ghraven:fix/validators-txt-encoding

Ghraven commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ghraven commented May 20, 2026

What

Why it matters

Before / After

How I verified

Scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant