Skip to content

Fix phase advance check#34

Merged
vas3a merged 4 commits intodevelopfrom
fix-phase-advance-check
Apr 15, 2026
Merged

Fix phase advance check#34
vas3a merged 4 commits intodevelopfrom
fix-phase-advance-check

Conversation

@vas3a
Copy link
Copy Markdown
Collaborator

@vas3a vas3a commented Mar 27, 2026

The biggest problem is a key-format mismatch in sync.

  • In autopilot-v6/src/autopilot/services/scheduler.service.ts, scheduled job IDs are stored as challengeId|phaseId.
  • But autopilot-v6/src/sync/sync.service.ts reads those raw keys, then later looks schedules up using challengeId:phaseId and removes them by splitting on : in autopilot-v6/src/sync/sync.service.ts and autopilot-v6/src/sync/sync.service.ts.

That explains these log patterns:

  • repeated “New active phase found … Scheduling …” for the same phase every 3 minutes
  • immediate “Obsolete schedule found …”
  • No active schedule found for challenge ...|..., phase undefined

So that part is a real bug.

Second issue: the same scheduled transition is being executed twice.

  • autopilot-v6/src/autopilot/services/scheduler.service.ts does both:
    • triggerKafkaEvent(...)
    • advancePhase(...)

If the Kafka consumer also processes that same event and advances the phase, you get exactly what the logs show:

  • event consumed
  • Advancing phase ...
  • same phase advanced again
  • one succeeds, the other errors with “already closed”

That also lines up with the BullMQ removal race:

  • Job ... could not be removed because it is locked by another worker

Third issue: open review phases are reprocessed on every challenge update.

  • autopilot-v6/src/autopilot/services/phase-schedule-manager.service.ts scans already-open review/screening/approval phases and calls handlePhaseOpenedForChallenge() again.

That matches:

  • review phase opens and creates 1 pending review
  • later manual phase detection processes the same open review phase
  • then it creates 2 more pending reviews

That may be intentional as backfill, but it is definitely another re-entry path.

Fourth issue: review opportunity creation is noisy and likely non-idempotent.

  • autopilot-v6/src/autopilot/services/phase-schedule-manager.service.ts calls review opportunity creation on every ACTIVE challenge update.
  • The create flow is list-then-create in autopilot-v6/src/autopilot/services/phase-schedule-manager.service.ts.

That makes 409 conflicts unsurprising under races or stale reads. It should probably treat HTTP 409 as “already exists” instead of an error.

So: yes, there are real issues here, not just noisy logs.

Fixes implemented:

  • Fixed sync key mismatch in sync.service.ts:
    • Uses scheduler job IDs (challenge|phase) for map lookups/comparisons.
    • Adds robust job-id parsing for cleanup, with safe skip + warning on malformed IDs.
    • Removes the phase undefined cancellation pattern caused by : splitting.
  • Removed duplicate execution path in scheduler.service.ts:
    • runScheduledTransition() now emits the Kafka transition event and no longer also calls local advancePhase().
    • Prevents “same END processed twice / already closed” behavior from scheduler path.
  • Added manual open-phase dedupe in phase-schedule-manager.service.ts:
    • Tracks per challenge+phase fingerprint for open-phase processing.
    • Skips reprocessing unchanged open review/screening/approval phases on repeated challenge updates.
    • Prunes stale phase entries when phases are no longer open.

Tests

  • Added regression test in sync.service.spec.ts for pipe-delimited job IDs + obsolete cancellation parsing.
  • Added regression test in phase-schedule-manager.service.spec.ts to ensure unchanged open phases are processed once.
  • Verified: targeted suites pass (2/2).
  • Note: the broad scheduler suite is currently noisy/flaky in this env (Redis/connect + timeout), but this was outside the targeted change verification.

Open with Devin

@vas3a vas3a requested a review from jmgasper March 27, 2026 14:27
{
setPhaseChainCallback: jest.fn(),
} as any,
schedulerService as any,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[⚠️ maintainability]
Casting schedulerService and phaseReviewService to any may hide type errors. Consider defining proper interfaces for these services to ensure type safety.

challengeApiService as any,
{} as any,
phaseReviewService as any,
{} as any,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[⚠️ maintainability]
The use of as any for dependency injection can lead to runtime errors if the injected objects do not conform to expected interfaces. Consider defining and using specific interfaces for these dependencies.

}

private buildOpenPhaseFingerprint(phase: IPhase): string {
const effectiveStart = phase.actualStartDate ?? phase.scheduledStartDate;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[⚠️ correctness]
The buildOpenPhaseFingerprint method uses a mix of actual and scheduled dates to build the fingerprint. If both actualStartDate and scheduledStartDate are null, it defaults to 'UNKNOWN'. Consider whether this is the desired behavior or if it might lead to unintended deduplication issues.

),
);

for (const key of this.processedOpenPhaseFingerprints.keys()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[⚠️ performance]
In pruneOpenPhaseProcessingState, the loop iterates over all keys in processedOpenPhaseFingerprints. If the map grows large, this could become a performance bottleneck. Consider optimizing this by maintaining a separate structure to track keys by challenge ID.

buildJobId: jest
.fn()
.mockImplementation((challengeId: string, phaseId: string) => {
return `${challengeId}|${phaseId}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[💡 style]
Consider using a template literal directly in the return statement for simplicity: return ${challengeId}|${phaseId};.

});
});

it('matches scheduled jobs by BullMQ job ID and cancels obsolete job IDs safely', async () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[💡 readability]
The test case name 'matches scheduled jobs by BullMQ job ID and cancels obsolete job IDs safely' is quite long and could be simplified for better readability. Consider shortening it while maintaining clarity.

Comment thread src/sync/sync.service.ts
);
const [challengeId, phaseId] = scheduledPhaseKey.split(':');

const parsedJob = this.parseScheduledJobId(scheduledJobId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[💡 maintainability]
The parseScheduledJobId method returns null if the job ID does not contain a delimiter or if the split results in empty challengeId or phaseId. Consider logging these cases for better traceability and debugging.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes repeated rescheduling/cancellation churn and duplicate phase-transition execution by aligning scheduler job-id handling across sync + scheduler, and adding dedupe for manual open-phase processing.

Changes:

  • Sync now matches/cancels scheduled transitions using BullMQ job IDs (challengeId|phaseId) and adds safer parsing for cleanup.
  • Scheduler no longer advances phases locally when a scheduled transition fires; it only emits the Kafka transition event.
  • Manual open-phase processing is deduped per challenge+phase fingerprint, with state pruning for phases that are no longer open.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/sync/sync.service.ts Uses scheduler job IDs for lookups/comparisons; adds parsing helper for cancellation cleanup.
src/sync/sync.service.spec.ts Adds regression coverage for pipe-delimited job IDs and obsolete cancellation behavior.
src/autopilot/services/scheduler.service.ts Removes local advancePhase() call from scheduled transition execution to avoid double-processing.
src/autopilot/services/phase-schedule-manager.service.ts Adds dedupe state for processing open review/screening/approval phases and pruning helper.
src/autopilot/services/phase-schedule-manager.service.spec.ts Adds regression coverage to ensure unchanged open phases are only processed once.
.circleci/config.yml Adjusts workflow filters to include dev-* tag builds for the DEV workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/sync/sync.service.ts
Comment on lines +436 to +438
`[MANUAL PHASE DETECTION] Skipping unchanged open phase ${phase.id} (${phase.name}) for challenge ${challengeDetails.id}; already processed for current phase-open fingerprint.`,
);
continue;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The open-phase dedupe fingerprint is based only on phase fields, but the processing logic also depends on challenge.reviewers (via resolveScorecardIdForOpenPhase). If reviewer configs/scorecard IDs change while the phase remains open, the fingerprint may be unchanged and this block will skip both pending-review scorecard updates and handlePhaseOpenedForChallenge, leaving stale pending reviews. Consider including the relevant reviewer/scorecard inputs in the fingerprint (e.g., selected scorecardId(s) for the phase) or running scorecard updates even when the phase fingerprint is unchanged.

Suggested change
`[MANUAL PHASE DETECTION] Skipping unchanged open phase ${phase.id} (${phase.name}) for challenge ${challengeDetails.id}; already processed for current phase-open fingerprint.`,
);
continue;
`[MANUAL PHASE DETECTION] Open phase ${phase.id} (${phase.name}) for challenge ${challengeDetails.id} has unchanged fingerprint; proceeding to re-run reviewer/scorecard processing.`,
);

Copilot uses AI. Check for mistakes.
Comment on lines 411 to +416
if (openPhasesRequiringScorecards.length > 0) {
this.pruneOpenPhaseProcessingState(
challengeDetails.id,
openPhasesRequiringScorecards,
);

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processedOpenPhaseFingerprints entries are only pruned when openPhasesRequiringScorecards.length > 0. If a challenge previously had an open review/screening/approval phase and later has none (or transitions out of ACTIVE and returns early), the per-challenge entries will never be deleted, so this map can grow unbounded over time. Consider pruning/clearing the challenge's entries when there are no open phases requiring scorecards (and/or when skipping non-ACTIVE challenges).

Copilot uses AI. Check for mistakes.
Comment thread .circleci/config.yml
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@kkartunov kkartunov self-requested a review March 30, 2026 12:39
Copy link
Copy Markdown
Contributor

@jmgasper jmgasper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - what's the best way to test this out in dev?

@vas3a vas3a merged commit 5108fd7 into develop Apr 15, 2026
6 of 7 checks passed
@vas3a vas3a deleted the fix-phase-advance-check branch April 15, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants