Skip to content

fix: emit telemetry eagerly in dev --logs to avoid SIGINT race#1372

Closed
Hweinstock wants to merge 1 commit into
aws:mainfrom
Hweinstock:fix/dev-logs-telemetry
Closed

fix: emit telemetry eagerly in dev --logs to avoid SIGINT race#1372
Hweinstock wants to merge 1 commit into
aws:mainfrom
Hweinstock:fix/dev-logs-telemetry

Conversation

@Hweinstock
Copy link
Copy Markdown
Contributor

Description

dev --logs never emits telemetry because the global SIGINT handler in cli.ts (setupGlobalCleanup) calls process.exit(0) synchronously, terminating the process before withCommandRunTelemetry can flush. The dev command's own SIGINT handler (registered later) never runs.

This change emits telemetry eagerly before the blocking server loop, matching the existing browser-mode pattern which has the same constraint.

Related Issue

Closes #

Documentation PR

N/A

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

The global SIGINT handler in cli.ts calls process.exit(0) synchronously,
which terminates the process before withCommandRunTelemetry can flush.
Emit telemetry before the blocking server loop, matching the existing
browser-mode pattern.
@github-actions github-actions Bot added the size/s PR size: S label May 22, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label May 22, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label May 22, 2026
@agentcore-devx-automation
Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label May 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.14.2.tgz

How to install

gh release download pr-1372-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.14.2.tgz

Copy link
Copy Markdown

@agentcore-cli-automation agentcore-cli-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix — diagnosis of the SIGINT race is correct, and the eager-emit pattern matches the existing browser-mode code path so it's reasonable to follow that precedent.

Two concerns before merging:

  1. Failure telemetry regression. The previous withCommandRunTelemetry wrapper recorded exit_reason: 'failure' (with classified error_name / error_source) when server.start() rejected or onExit fired with a non-zero code. After this change, those failures are emitted as success because the eager emit happens before startup, and the outer catch (error) at the bottom of the handler doesn't emit any failure telemetry. We not only lose failure signal, we actively mislabel crashes as successes.

  2. No test for the fix. The existing integ test for dev --logs (integ-tests/dev-server.test.ts) spawns the dev server without telemetry.env and only asserts telemetry on dev <prompt> invocations, so nothing prevents this from regressing again. Since the whole point of the PR is to make telemetry actually fire on this code path, it would be good to extend that test (spawn with telemetry.env, send SIGTERM/SIGINT, assert the dev_action: 'server', ui_mode: 'terminal' metric was emitted).

Details inline.

client.emit('cli.command_run', 0, {
command_group: 'dev',
command: 'dev',
exit_reason: 'success',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding exit_reason: 'success' here loses the failure-telemetry behavior the previous withCommandRunTelemetry wrapper provided. If server.start() rejects (e.g. binary not found, port grabbed between findAvailablePort and server.start), or onExit fires with a non-zero code, the await new Promise(...) below throws, the outer catch (error) at line 513 prints the error and exits 1 — but telemetry has already recorded this as a successful run. Previously trackCommandRun/classifyError would have emitted exit_reason: 'failure' with error_name and error_source.

A few ways to fix:

  • (a) Move the eager success emit to just before the await new Promise and wrap server.start() / startup in a try-block such that failures emit a separate failure event (and skip the success emit).
  • (b) Don't emit eagerly — instead, refactor so the dev command's SIGINT handler can perform async cleanup (e.g. teach setupGlobalCleanup to await registered async cleanups before process.exit), then keep withCommandRunTelemetry and get correct success/failure telemetry for free. Bigger change but fixes the root cause and would also help browser-mode.
  • (c) Accept the regression and document it in a comment, mirroring the note above runBrowserMode. Least work but means we lose error classification on dev-server startup failures.

My preference would be (a) since it's localized and preserves the existing failure-telemetry contract.

}
);
if (!devResult.success) throw devResult.error;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test coverage for this fix. integ-tests/dev-server.test.ts already spawns agentcore dev --logs but doesn't pass telemetry.env and only asserts telemetry from the separate dev <prompt> invoke calls, so nothing currently verifies that the SIGINT fix works or guards against it regressing. Could you extend that test to spawn with telemetry.env, send SIGINT, wait for exit, and telemetry.assertMetricEmitted({ command: 'dev', dev_action: 'server', ui_mode: 'terminal', exit_reason: 'success' })?

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 22, 2026
@Hweinstock Hweinstock closed this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/s PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants