Skip to content

[reliability] Daily Reliability Review - 2026-05-22 #34137

@github-actions

Description

@github-actions

Executive Summary

For the last 24 hours in the gh-aw Sentry project (org github), telemetry shows a small, focused set of operational failures and several observability gaps that are higher-signal than the failures themselves.

  • Total spans: 19,985 (4,016 gen_ai + 7,776 http.server + 5,834 http.client + 2,380 default).
  • Conclusion spans (have gh-aw.run.status): 1,987 — 1,954 success, 30 failure, 0 timeout, 0 cancelled.
  • errors dataset: 0 events. logs dataset: 0 events. (Reported explicitly — these datasets are empty, not skipped.)
  • OTLP self-reported export failures (gh-aw.otlp.export_errors:>0): 0.
  • Confirmed operational failures: all 30 are safe-output validation rejections (item-count and body-length rules), concentrated in Smoke Copilot (20), LintMonster (4), Daily CLI Tools Exploratory Tester (4), Deployment Incident Monitor (2).
  • Instrumentation gaps (high signal): span.status, gen_ai.response.finish_reasons, release, and service.version are null across 100% of spans in the 24h window, including agent conclusion spans where the emitter is supposed to populate them.

Overall health: operationally green (1.5% failure rate on conclusion spans, all due to a known validation pattern), but observability is degraded — runtime outcome on the OTLP status.code channel, agent stop-reason on gen_ai.response.finish_reasons, and release correlation are all unreadable in the Sentry query layer, so traces look healthier than they are and length-truncation cannot be detected.

Top Reliability Findings

Priority Workflow Problem Evidence Next Action
P1 Smoke Copilot 20 conclusion spans with gh-aw.run.status:failure — safe-output validation rejects (create_discussion 'body' is too short (minimum 64 characters), Too many items of type 'add_comment'. Maximum allowed: 2.) gh-aw.run.status:failure gh-aw.workflow.name:"Smoke Copilot" returned 20 spans; trace b6215698c10f141728d30cc4ef48fe93 and trace ced2ef961311a5be0aafbfab945933c0 both repeat the same body-length and item-count rejection Fix Copilot smoke prompt to emit create_discussion bodies ≥ 64 chars and respect add_comment cap of 2; the same payload was retried at least 4 times across the window
P1 Daily CLI Tools Exploratory Tester 4 failures — Too many items of type 'create_issue'. Maximum allowed: 1. Trace 42c90277cf86f99e7ffd6d26f00f65cf (2026-05-22T06:29:58Z) Tighten exploratory tester prompt or raise the create_issue safe-output limit if 1 is too aggressive for this workflow
P2 LintMonster 4 failures Trace 56d87d5d7b58fc8386917605b5b35b53 (2026-05-22T03:44–03:50Z, four spans on the same trace) Inspect that one run for repeated errors — pattern looks like one bad run, not a recurring class
P2 Deployment Incident Monitor 2 failures gh-aw.run.status:failure gh-aw.workflow.name:"Deployment Incident Monitor" count=2 Confirm whether expected for this monitor's contract; low volume
P1 (observability) all workflows span.status is null on 1,987/1,987 conclusion spans aggregate by span.status with has:gh-aw.run.status → single bucket {null: 1987} OTLP status.code is set in actions/setup/js/send_otlp_span.cjs:305,1837 — investigate whether Sentry's spans dataset surfaces OTLP status under a different field (e.g. span.status_code) or whether the OTLP exporter is stripping it; until then dashboards must rely on gh-aw.run.status
P1 (observability) agent conclusion spans gen_ai.response.finish_reasons is null on all 1,987 conclusion spans and all 4,016 span.op:gen_ai spans aggregate on gen_ai.response.finish_reasons for both span.op:gen_ai (4016 null) and has:gh-aw.run.status (1987 null) Emitter at actions/setup/js/send_otlp_span.cjs:1899-1900 claims the array is always emitted on jobName === "agent" conclusion spans (with unknown/timeout sentinel). Either (a) jobName === "agent" is never matched in the live emit path, or (b) buildArrayAttr array values are being dropped by the exporter / not indexed by Sentry. Length-truncation is currently undetectable — fix this before chasing further runtime issues
P2 (observability) all spans release and service.version null on 19,985/19,985 spans aggregate by release and service.version → single null bucket each Resource attr emitted only when scopeVersion && scopeVersion !== "unknown" (send_otlp_span.cjs:321-323); set a real version (gh-aw release tag) on the setup action's scope so release correlation works in Sentry
P3 multiple long-running agents Several gen_ai spans run 15–22 min — Copilot Agent Prompt Clustering Analysis max 1,317,663 ms; Copilot Session Insights max 1,300,291 ms; [aw] Failure Investigator (6h) 40 spans avg 74 s, max 1,140,848 ms sort by -max(span.duration) on span.op:gen_ai; example trace 9a9796b0ea3622fee6d4f9b26f8930c2 (Failure Investigator, 19 min) Likely normal for daily agents; flagged only because finish_reasons is missing, so we cannot distinguish a deliberate long run from a length-truncated run

Representative Traces

View representative traces

Recommendations

  1. Fix the Smoke Copilot safe-output prompt contract first (smallest fix, largest noise reduction). Twenty of the 30 failures are the same two validation messages — either lengthen the discussion body template in the workflow to ≥ 64 chars or relax the rule, and constrain the agent to ≤ 2 add_comment items. This alone drops the daily failure count from 30 to ~10.
  2. Restore gen_ai.response.finish_reasons on agent conclusion spans. The emitter contract at actions/setup/js/send_otlp_span.cjs:1899-1900 says these are always emitted, but Sentry sees null for 100% of conclusion spans. Add a unit-level assertion that attributes contains the key when jobName === "agent", and verify the OTLP exporter is not dropping buildArrayAttr payloads. Without this we cannot detect length-truncation or distinguish timeouts from clean exits.
  3. Populate release / service.version on the resource scope. send_otlp_span.cjs:321-323 skips emit when scopeVersion is "unknown". Pass the gh-aw action version (or commit SHA) into setup so release correlation works; otherwise we cannot bisect a regression by deploy.
  4. Decide whether to keep relying on gh-aw.run.status for failure queries. Sentry's span.status is null for all conclusion spans even though OTLP status.code is set in the emit payload. Either confirm that Sentry's spans dataset uses a different field (e.g. span.status_code), or document gh-aw.run.status as the canonical failure attribute and stop documenting OTLP status.code as queryable.

Notes

View notes
  • The Sentry MCP build used here exposes list_events but not search_events or get_trace_details; trace continuity was verified by filtering list_events on trace:<id> and ordering by timestamp.
  • The errors and logs datasets returning zero events is an explicit observability finding, not a skipped check — either no SDK is configured to send errors/logs for this project, or no error/log events were emitted in the window.
  • gh-aw.run.status:timeout and gh-aw.run.status:cancelled both returned 0 events. The emitter at send_otlp_span.cjs:1820-1830 distinguishes these states; the absence is meaningful (no runs timed out or were cancelled in the window), not a missing attribute.
  • gen_ai.response.finish_reasons:length could not be queried for truncation because the attribute is null everywhere. The check is inconclusive runtime outcome + confirmed instrumentation gap, not a confirmed clean run.
  • All latency outliers cited include count, max, and trace ID per the evidence-first contract; one-off long runs are flagged but not promoted to P1 because they cluster on long-by-design daily agents.

References:

Generated by 🚨 Daily Reliability Review · ● 9.2M ·

  • expires on May 24, 2026, 11:21 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions