feat: early dedup of optimization candidates before benchmark loop#1884
feat: early dedup of optimization candidates before benchmark loop#1884
Conversation
|
Claude finished @KRRT7's task in 2m 26s —— View job
PR Review SummaryPrek Checks
Code ReviewBug:
|
Dedup candidates in CandidateProcessor when each batch arrives (initial, line profiler, repair, refinement, adaptive) instead of only catching duplicates one-by-one during the benchmark loop. Changes: - Add dedup_candidates() to CandidateProcessor with persistent seen_normalized set that tracks queued candidates across batches - Simplify handle_duplicate_candidate/register_new_candidate to accept original_flat_code string instead of full CodeOptimizationContext - 14 unit tests covering all dedup paths
…alization Candidates that pass through dedup_candidates already have their normalized code computed — cache it and pass to process_single_candidate to skip redundant AST parsing.
|
@claude fix the issues identified in #1884 (comment) |
|
Claude finished @KRRT7's task in 1m 13s —— View job Fixed issues from PR review
Changes made
Explains that intra-batch duplicates are intentionally dropped with no result back-fill, since no benchmark results exist yet to copy.
Explains that the |
…comments - Use .get() for speedup_ratios/is_correct/optimized_runtimes when copying results from past_opt_id, guarding against the case where a candidate was registered but never benchmarked due to an unhandled exception - Add comment explaining intra-batch duplicates are intentionally dropped with no result back-fill (no results exist yet to copy) - Add comment explaining the normalized_code checks in process_single_candidate are defensive fallbacks that should not fire in normal operation Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
|
@claude review this PR |
|
Claude finished @KRRT7's task in 2m 43s —— View job PR Review
Overall AssessmentThe design is sound and well-implemented. The core invariant — that Two issues worth addressing before merge: Bug:
|
…hang _get_valid_candidates had no try/except around OptimizedCandidate construction. A Pydantic ValidationError from one bad candidate would crash the entire batch, propagate as an unhandled exception, and block on executor.shutdown(wait=True) in the finally block.
Summary
CandidateProcessorwhen each batch arrives (initial, line profiler, repair, refinement, adaptive) instead of only catching duplicates one-by-one during the benchmark loopseen_normalizedset tracks queued candidates across batches, so duplicates between sources (e.g. initial optimization vs line profiler) are caught even before they're benchmarkedhandle_duplicate_candidate/register_new_candidateto acceptoriginal_flat_code: strinstead of fullCodeOptimizationContextMotivation
Observed in CI run on PR #2485 (codeflash-internal): 3 of 13 candidates were duplicates across sources (initial optimization vs line profiler). While the existing per-candidate dedup correctly skipped benchmarking them, the candidate count was inflated and minor per-candidate overhead was wasted.
Test plan
tests/test_early_dedup.pycovering:seen_normalizedcatches dups across batches