Fix grader compatibility with OpenClaw transcripts by jijivski · Pull Request #86 · pinchbench/skill

jijivski · 2026-04-01T05:42:00Z

Improve grader compatibility with current OpenClaw transcripts

The grader currently assumes a narrower transcript format than the one produced by current OpenClaw runtime, which can lead to false negatives.

Changes:

read tool inputs from toolCall.arguments
support file alongside path / file_path
improve judge score parsing robustness

These changes do not alter task requirements; they only make grading align with real transcript output.

ScuttleBot

ScuttleBot review 🦀

Solid defensive fix. The grader was too rigid about transcript formats, causing false negatives on valid runs.

What's good:

_coerce_score_value() handles the full zoo of judge response formats (nested dicts, string numbers, boolean rejection)
Supporting file alongside path/file_path aligns with how OpenClaw actually emits tool calls
The refactor into _extract_named_scores() and _extract_total_score() is cleaner than the previous inline conditionals

One question:

Task file changes (task_08, task_10, task_18) — are these tested against transcripts from multiple agents? The file param support looks correct but I want to confirm this doesn't break Cursor/Windsurf/Claude Code grading.

Otherwise LGTM. This will reduce the "score 0 but the agent clearly did the work" cases.

ScuttleBot · 2026-04-06T18:12:57Z

Merge conflict resolution available

I've rebased this PR onto main and resolved the conflict in lib_grading.py. The conflict was between the new _parse_judge_text() function (added in main via #87) and the helper functions in this PR (_coerce_score_value, _extract_named_scores, _extract_total_score).

Resolution: Keep both — _parse_judge_text() first, then the helper functions. Both are needed.

@jijivski — could you rebase your branch onto main? The resolution is straightforward:

git fetch upstream
git rebase upstream/main
# Resolve lib_grading.py by keeping both function sets
git add scripts/lib_grading.py
git rebase --continue
git push --force-with-lease

Alternatively, @olearycrew has admin access and can use GitHub's "Update branch" button if the repo allows maintainer edits on this PR.

olearycrew · 2026-04-06T18:21:09Z

@jijivski can you take a look at the conflicts here?

olearycrew

@jijivski can you fix the issues from the linter as well as fix the conflicts with main? Thanks!

…ript-compat-v2

kilo-code-bot · 2026-04-15T07:26:01Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

This PR is well-implemented. The refactoring of _parse_judge_response to prefer parsing the most recent assistant text (skipping NO_REPLY and noise) is a robust improvement. The extracted helpers (_coerce_score_value, _extract_named_scores, _extract_total_score) are clean and well-tested. The task grader fixes (arguments fallback, file param support) are correct and backwards-compatible.

Files Reviewed (4 files)

scripts/lib_grading.py
tasks/task_market_research.md
tasks/task_memory.md
tasks/task_workflow.md
tests/test_lib_grading.py

_{Reviewed by claude-4.6-sonnet-20260217 · 157,987 tokens}

jijivski · 2026-04-15T07:29:14Z

@olearycrew Hi, I've synced with latest main in one commit, then fixed judge transcript parsing in a follow-up commit.

For Claude Opus 4.5 multi-part judge transcripts, we now prefer the final assistant judgment JSON over earlier echoed tool JSON / waiting messages. Verified locally on the previously failing transcripts.

Fix grader compatibility with OpenClaw transcripts

257cda6

ScuttleBot reviewed Apr 6, 2026

View reviewed changes

ScuttleBot mentioned this pull request Apr 6, 2026

Clean up some recent changes #83

Closed

olearycrew requested changes Apr 14, 2026

View reviewed changes

ZhuChenghao added 2 commits April 15, 2026 15:22

Merge remote-tracking branch 'upstream/main' into fix/openclaw-transc…

a278f68

…ript-compat-v2

Prefer final assistant judgment JSON over echoed tool JSON

0674881

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix grader compatibility with OpenClaw transcripts#86

Fix grader compatibility with OpenClaw transcripts#86
jijivski wants to merge 3 commits intopinchbench:mainfrom
jijivski:fix/openclaw-transcript-compat-v2

jijivski commented Apr 1, 2026

Uh oh!

ScuttleBot left a comment

Uh oh!

ScuttleBot commented Apr 6, 2026

Uh oh!

olearycrew commented Apr 6, 2026

Uh oh!

olearycrew left a comment

Uh oh!

kilo-code-bot bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

jijivski commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jijivski commented Apr 1, 2026

Uh oh!

ScuttleBot left a comment

Choose a reason for hiding this comment

Uh oh!

ScuttleBot commented Apr 6, 2026

Uh oh!

olearycrew commented Apr 6, 2026

Uh oh!

olearycrew left a comment

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Uh oh!

jijivski commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kilo-code-bot bot commented Apr 15, 2026 •

edited

Loading

jijivski commented Apr 15, 2026 •

edited

Loading