feat: early dedup of optimization candidates before benchmark loop by KRRT7 · Pull Request #1884 · codeflash-ai/codeflash

KRRT7 · 2026-03-22T09:56:34Z

Summary

Dedup candidates in CandidateProcessor when each batch arrives (initial, line profiler, repair, refinement, adaptive) instead of only catching duplicates one-by-one during the benchmark loop
Persistent seen_normalized set tracks queued candidates across batches, so duplicates between sources (e.g. initial optimization vs line profiler) are caught even before they're benchmarked
Gives accurate candidate counts in progress display ("Candidate X/10" not "X/13")
Simplifies handle_duplicate_candidate/register_new_candidate to accept original_flat_code: str instead of full CodeOptimizationContext

Motivation

Observed in CI run on PR #2485 (codeflash-internal): 3 of 13 candidates were duplicates across sources (initial optimization vs line profiler). While the existing per-candidate dedup correctly skipped benchmarking them, the candidate count was inflated and minor per-candidate overhead was wasted.

Test plan

14 unit tests in tests/test_early_dedup.py covering:
- Unique candidates pass through
- Identical-to-original removal
- Intra-batch and cross-batch dedup
- Persistent seen_normalized catches dups across batches
- Result copying for duplicates (success + failure paths)
- Shorter source code tracking
Existing 1226 tests pass (1 pre-existing failure unrelated)

claude · 2026-03-22T09:57:10Z

Claude finished @KRRT7's task in 2m 26s —— View job

Triage PR size — LARGE (113 lines production code changed)
Run lint/typecheck (prek) — passes clean
Resolve stale threads — no unresolved threads
Review diff
Duplicate detection
Post summary

PR Review Summary

Prek Checks

uv run prek run --from-ref origin/main passes clean. All 14 new unit tests pass.

Code Review

Bug: `is_correct.get()` type violation — still present from previous review

File: codeflash/models/models.py:553

self.is_correct[candidate.optimization_id] = self.is_correct.get(past_opt_id)

is_correct is declared dict[str, bool] (line 522), but .get() without a default returns bool | None. This stores None into the dict when past_opt_id was registered but never benchmarked — a type contract violation that could trip up future code expecting bool. This was flagged in the previous review and is still unaddressed.

Fix:

self.is_correct[candidate.optimization_id] = self.is_correct.get(past_opt_id, False)

Fix this →

Nit: unused `import pytest` in test file — still present from previous review

File: tests/test_early_dedup.py:8

import pytest is imported but never used (no pytest.raises, pytest.mark, etc.). Flagged in the previous review and not yet removed.

Fix this →

Duplicate Detection

No duplicates detected. dedup_candidates is new logic centralized in CandidateProcessor; the normalizer dispatch (normalize_fn) correctly delegates to language-specific implementations, so there's no parallel reimplementation in languages/python/ or languages/javascript/.

Test Coverage

14 new unit tests in tests/test_early_dedup.py cover all dedup paths: unique pass-through, identical-to-original, intra-batch, cross-batch, result copying, and shorter-source tracking. All pass. The aiservice.py and function_optimizer.py coverage is low (14%), but that's expected for integration-level files that require a full orchestration environment.

Overall: The implementation is correct and the design is sound. Two issues from the previous review cycle remain open — the is_correct.get() type violation (minor bug) and the unused import pytest (nit). Both are straightforward one-line fixes.
· Branch: early-dedup

Dedup candidates in CandidateProcessor when each batch arrives (initial, line profiler, repair, refinement, adaptive) instead of only catching duplicates one-by-one during the benchmark loop. Changes: - Add dedup_candidates() to CandidateProcessor with persistent seen_normalized set that tracks queued candidates across batches - Simplify handle_duplicate_candidate/register_new_candidate to accept original_flat_code string instead of full CodeOptimizationContext - 14 unit tests covering all dedup paths

…alization Candidates that pass through dedup_candidates already have their normalized code computed — cache it and pass to process_single_candidate to skip redundant AST parsing.

KRRT7 · 2026-03-22T10:39:00Z

@claude fix the issues identified in #1884 (comment)

claude · 2026-03-22T10:39:32Z

Claude finished @KRRT7's task in 1m 13s —— View job

Fixed issues from PR review

Gather context
Fix bug: handle_duplicate_candidate dict access raises KeyError → use .get()
Minor: Add comment for intra-batch duplicate silent drop behavior
Nit: Add comment explaining dead-code defensive fallbacks in process_single_candidate
Run prek to verify — passes clean
Commit and push

Changes made

codeflash/models/models.py — bug fix

speedup_ratios, is_correct, and optimized_runtimes now use .get(past_opt_id) instead of direct dict access. Guards against the edge case where a candidate was registered via register_new_candidate but never benchmarked (e.g. unhandled exception in process_single_candidate before record_* was called).

codeflash/languages/function_optimizer.py:307 — comment added

Explains that intra-batch duplicates are intentionally dropped with no result back-fill, since no benchmark results exist yet to copy.

codeflash/languages/function_optimizer.py:1150 — comment added

Explains that the normalized_code == normalized_original and normalized_code in eval_ctx.ast_code_to_id checks are defensive fallbacks that should not fire in normal operation (since dedup_candidates already filters these before the benchmark loop).
· Branch: early-dedup

…comments - Use .get() for speedup_ratios/is_correct/optimized_runtimes when copying results from past_opt_id, guarding against the case where a candidate was registered but never benchmarked due to an unhandled exception - Add comment explaining intra-batch duplicates are intentionally dropped with no result back-fill (no results exist yet to copy) - Add comment explaining the normalized_code checks in process_single_candidate are defensive fallbacks that should not fire in normal operation Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>

KRRT7 · 2026-03-22T11:02:56Z

@claude review this PR

claude · 2026-03-22T11:03:27Z

Claude finished @KRRT7's task in 2m 43s —— View job

PR Review

Overall Assessment

The design is sound and well-implemented. The core invariant — that _handle_empty_queue is only reached when the queue is empty (i.e., all prior candidates have been benchmarked sequentially) — means seen_normalized and ast_code_to_id cover all dedup scenarios without races. Test coverage is thorough.

Two issues worth addressing before merge:

Bug: `is_correct.get()` introduces a type violation

File: codeflash/models/models.py:553

self.is_correct[candidate.optimization_id] = self.is_correct.get(past_opt_id)

is_correct is declared as dict[str, bool] (line 522), but .get() returns bool | None. If past_opt_id was registered but never benchmarked, this stores None into the dict. The consequence is subtle: codeflash/languages/function_optimizer.py:2894 does if is_correct which works with None (falsy), but it's a type contract violation that could trip up future code expecting bool.

The fix is a one-character change — use .get(past_opt_id, False) to default to False:

self.is_correct[candidate.optimization_id] = self.is_correct.get(past_opt_id, False)

Fix this →

Nit: `import pytest` is unused in `tests/test_early_dedup.py`

File: tests/test_early_dedup.py:8

import pytest

pytest is imported but never referenced in the file (no pytest.raises, pytest.mark, etc.). With select = ["ALL"] and no F401 in the ignore list in pyproject.toml, this would normally be flagged by ruff. Worth removing to keep the file clean.

Fix this →

Minor: misleading dedup log message for a specific edge case

File: codeflash/languages/function_optimizer.py:307-311

When an initial-batch candidate fails before register_new_candidate is called (e.g., did_update=False at line 1185, or syntax error at line 1189), it is in seen_normalized but not in ast_code_to_id. If an async batch later has an identical candidate, it hits the seen_normalized branch and is counted in removed_duplicate with the log message "duplicates" — not "already-benchmarked duplicates". This is correct behavior (the dup would also fail), but the comment on line 308 ("Intra-batch duplicate") is slightly misleading in this cross-batch scenario. Low priority, no code change needed.

Design observations (no action needed)

seen_normalized persists after candidates are benchmarked: This causes async batch duplicates of already-benchmarked initial candidates to be caught by seen_normalized first (before ast_code_to_id), falling through to the silent-drop path instead of the result-copying path. In practice this can't happen because by the time the queue empties and async batches are fetched, all initial candidates are in ast_code_to_id. The order of checks is therefore safe.
candidate_len denominator grows mid-run: total_candidates passed to process_single_candidate is processor.candidate_len which grows as async batches arrive. Users will see "Candidate 1/5" then "Candidate 6/8". This is intentional and better than the previous inflated-by-dupes count.
Intra-batch seen_normalized check correctly precedes ast_code_to_id check: The ordering matters — ast_code_to_id catches "already benchmarked" dups and copies results; seen_normalized catches "same batch, not yet benchmarked" dups and drops them. The check order correctly handles both.
· Branch: early-dedup

…idates

…hang _get_valid_candidates had no try/except around OptimizedCandidate construction. A Pydantic ValidationError from one bad candidate would crash the entire batch, propagate as an unhandled exception, and block on executor.shutdown(wait=True) in the finally block.

KRRT7 force-pushed the early-dedup branch from fb544f3 to c72945b Compare March 22, 2026 10:19

refactor: cache normalized code from early dedup to avoid double norm…

7c7fde8

…alization Candidates that pass through dedup_candidates already have their normalized code computed — cache it and pass to process_single_candidate to skip redundant AST parsing.

KRRT7 added 2 commits March 22, 2026 07:10

chore: add debug logging to diagnose CI hang after line profiler cand…

f4b48a8

…idates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: early dedup of optimization candidates before benchmark loop#1884

feat: early dedup of optimization candidates before benchmark loop#1884
KRRT7 wants to merge 5 commits intomainfrom
early-dedup

KRRT7 commented Mar 22, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

KRRT7 commented Mar 22, 2026

Uh oh!

claude bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

KRRT7 commented Mar 22, 2026

Uh oh!

claude bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KRRT7 commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

claude bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Bug: is_correct.get() type violation — still present from previous review

Nit: unused import pytest in test file — still present from previous review

Duplicate Detection

Test Coverage

Uh oh!

KRRT7 commented Mar 22, 2026

Uh oh!

claude bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixed issues from PR review

Changes made

Uh oh!

KRRT7 commented Mar 22, 2026

Uh oh!

claude bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Overall Assessment

Bug: is_correct.get() introduces a type violation

Nit: import pytest is unused in tests/test_early_dedup.py

Minor: misleading dedup log message for a specific edge case

Design observations (no action needed)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KRRT7 commented Mar 22, 2026 •

edited

Loading

claude bot commented Mar 22, 2026 •

edited

Loading

Bug: `is_correct.get()` type violation — still present from previous review

Nit: unused `import pytest` in test file — still present from previous review

claude bot commented Mar 22, 2026 •

edited

Loading

claude bot commented Mar 22, 2026 •

edited

Loading

Bug: `is_correct.get()` introduces a type violation

Nit: `import pytest` is unused in `tests/test_early_dedup.py`