| P1 |
Smoke Copilot |
20 conclusion spans with gh-aw.run.status:failure — safe-output validation rejects (create_discussion 'body' is too short (minimum 64 characters), Too many items of type 'add_comment'. Maximum allowed: 2.) |
gh-aw.run.status:failure gh-aw.workflow.name:"Smoke Copilot" returned 20 spans; trace b6215698c10f141728d30cc4ef48fe93 and trace ced2ef961311a5be0aafbfab945933c0 both repeat the same body-length and item-count rejection |
Fix Copilot smoke prompt to emit create_discussion bodies ≥ 64 chars and respect add_comment cap of 2; the same payload was retried at least 4 times across the window |
| P1 |
Daily CLI Tools Exploratory Tester |
4 failures — Too many items of type 'create_issue'. Maximum allowed: 1. |
Trace 42c90277cf86f99e7ffd6d26f00f65cf (2026-05-22T06:29:58Z) |
Tighten exploratory tester prompt or raise the create_issue safe-output limit if 1 is too aggressive for this workflow |
| P2 |
LintMonster |
4 failures |
Trace 56d87d5d7b58fc8386917605b5b35b53 (2026-05-22T03:44–03:50Z, four spans on the same trace) |
Inspect that one run for repeated errors — pattern looks like one bad run, not a recurring class |
| P2 |
Deployment Incident Monitor |
2 failures |
gh-aw.run.status:failure gh-aw.workflow.name:"Deployment Incident Monitor" count=2 |
Confirm whether expected for this monitor's contract; low volume |
| P1 (observability) |
all workflows |
span.status is null on 1,987/1,987 conclusion spans |
aggregate by span.status with has:gh-aw.run.status → single bucket {null: 1987} |
OTLP status.code is set in actions/setup/js/send_otlp_span.cjs:305,1837 — investigate whether Sentry's spans dataset surfaces OTLP status under a different field (e.g. span.status_code) or whether the OTLP exporter is stripping it; until then dashboards must rely on gh-aw.run.status |
| P1 (observability) |
agent conclusion spans |
gen_ai.response.finish_reasons is null on all 1,987 conclusion spans and all 4,016 span.op:gen_ai spans |
aggregate on gen_ai.response.finish_reasons for both span.op:gen_ai (4016 null) and has:gh-aw.run.status (1987 null) |
Emitter at actions/setup/js/send_otlp_span.cjs:1899-1900 claims the array is always emitted on jobName === "agent" conclusion spans (with unknown/timeout sentinel). Either (a) jobName === "agent" is never matched in the live emit path, or (b) buildArrayAttr array values are being dropped by the exporter / not indexed by Sentry. Length-truncation is currently undetectable — fix this before chasing further runtime issues |
| P2 (observability) |
all spans |
release and service.version null on 19,985/19,985 spans |
aggregate by release and service.version → single null bucket each |
Resource attr emitted only when scopeVersion && scopeVersion !== "unknown" (send_otlp_span.cjs:321-323); set a real version (gh-aw release tag) on the setup action's scope so release correlation works in Sentry |
| P3 |
multiple long-running agents |
Several gen_ai spans run 15–22 min — Copilot Agent Prompt Clustering Analysis max 1,317,663 ms; Copilot Session Insights max 1,300,291 ms; [aw] Failure Investigator (6h) 40 spans avg 74 s, max 1,140,848 ms |
sort by -max(span.duration) on span.op:gen_ai; example trace 9a9796b0ea3622fee6d4f9b26f8930c2 (Failure Investigator, 19 min) |
Likely normal for daily agents; flagged only because finish_reasons is missing, so we cannot distinguish a deliberate long run from a length-truncated run |
Executive Summary
For the last 24 hours in the
gh-awSentry project (orggithub), telemetry shows a small, focused set of operational failures and several observability gaps that are higher-signal than the failures themselves.gen_ai+ 7,776http.server+ 5,834http.client+ 2,380default).gh-aw.run.status): 1,987 — 1,954success, 30failure, 0timeout, 0cancelled.errorsdataset: 0 events.logsdataset: 0 events. (Reported explicitly — these datasets are empty, not skipped.)gh-aw.otlp.export_errors:>0): 0.span.status,gen_ai.response.finish_reasons,release, andservice.versionare null across 100% of spans in the 24h window, including agent conclusion spans where the emitter is supposed to populate them.Overall health: operationally green (1.5% failure rate on conclusion spans, all due to a known validation pattern), but observability is degraded — runtime outcome on the OTLP
status.codechannel, agent stop-reason ongen_ai.response.finish_reasons, and release correlation are all unreadable in the Sentry query layer, so traces look healthier than they are and length-truncation cannot be detected.Top Reliability Findings
gh-aw.run.status:failure— safe-output validation rejects (create_discussion 'body' is too short (minimum 64 characters),Too many items of type 'add_comment'. Maximum allowed: 2.)gh-aw.run.status:failure gh-aw.workflow.name:"Smoke Copilot"returned 20 spans; traceb6215698c10f141728d30cc4ef48fe93and traceced2ef961311a5be0aafbfab945933c0both repeat the same body-length and item-count rejectioncreate_discussionbodies ≥ 64 chars and respectadd_commentcap of 2; the same payload was retried at least 4 times across the windowToo many items of type 'create_issue'. Maximum allowed: 1.42c90277cf86f99e7ffd6d26f00f65cf(2026-05-22T06:29:58Z)create_issuesafe-output limit if 1 is too aggressive for this workflow56d87d5d7b58fc8386917605b5b35b53(2026-05-22T03:44–03:50Z, four spans on the same trace)gh-aw.run.status:failure gh-aw.workflow.name:"Deployment Incident Monitor"count=2span.statusis null on 1,987/1,987 conclusion spansspan.statuswithhas:gh-aw.run.status→ single bucket{null: 1987}status.codeis set inactions/setup/js/send_otlp_span.cjs:305,1837— investigate whether Sentry's spans dataset surfaces OTLP status under a different field (e.g.span.status_code) or whether the OTLP exporter is stripping it; until then dashboards must rely ongh-aw.run.statusgen_ai.response.finish_reasonsis null on all 1,987 conclusion spans and all 4,016span.op:gen_aispansgen_ai.response.finish_reasonsfor bothspan.op:gen_ai(4016 null) andhas:gh-aw.run.status(1987 null)actions/setup/js/send_otlp_span.cjs:1899-1900claims the array is always emitted onjobName === "agent"conclusion spans (withunknown/timeoutsentinel). Either (a)jobName === "agent"is never matched in the live emit path, or (b)buildArrayAttrarray values are being dropped by the exporter / not indexed by Sentry. Length-truncation is currently undetectable — fix this before chasing further runtime issuesreleaseandservice.versionnull on 19,985/19,985 spansreleaseandservice.version→ single null bucket eachscopeVersion && scopeVersion !== "unknown"(send_otlp_span.cjs:321-323); set a real version (gh-aw release tag) on the setup action's scope so release correlation works in Sentrygen_aispans run 15–22 min — Copilot Agent Prompt Clustering Analysis max 1,317,663 ms; Copilot Session Insights max 1,300,291 ms;[aw] Failure Investigator (6h)40 spans avg 74 s, max 1,140,848 ms-max(span.duration)onspan.op:gen_ai; example trace9a9796b0ea3622fee6d4f9b26f8930c2(Failure Investigator, 19 min)finish_reasonsis missing, so we cannot distinguish a deliberate long run from a length-truncated runRepresentative Traces
View representative traces
b6215698c10f141728d30cc4ef48fe93— https://github.sentry.io/explore/traces/trace/b6215698c10f141728d30cc4ef48fe93 —gh-aw.error.messages: Line 2: create_discussion 'body' is too short (minimum 64 characters)ced2ef961311a5be0aafbfab945933c0— https://github.sentry.io/explore/traces/trace/ced2ef961311a5be0aafbfab945933c0 —gh-aw.error.messages: Line 4: create_discussion 'body' is too short (minimum 64 characters) | Line 10: Too many items of type 'add_comment'. Maximum allowed: 2.20f72a27241e4d2f3fd6df073a612f9c— https://github.sentry.io/explore/traces/trace/20f72a27241e4d2f3fd6df073a612f9c —Too many items of type 'add_comment'. Maximum allowed: 2.create_issuecap exceeded42c90277cf86f99e7ffd6d26f00f65cf— https://github.sentry.io/explore/traces/trace/42c90277cf86f99e7ffd6d26f00f65cf —gh-aw.error.messages: Line 2: Too many items of type 'create_issue'. Maximum allowed: 1.56d87d5d7b58fc8386917605b5b35b53— https://github.sentry.io/explore/traces/trace/56d87d5d7b58fc8386917605b5b35b53finish_reasonsnull so cannot tell if completed or length-truncated)9a9796b0ea3622fee6d4f9b26f8930c2—[aw] Failure Investigator (6h), single gen_ai span 1,140,848 ms (~19 min)Recommendations
add_commentitems. This alone drops the daily failure count from 30 to ~10.gen_ai.response.finish_reasonson agent conclusion spans. The emitter contract atactions/setup/js/send_otlp_span.cjs:1899-1900says these are always emitted, but Sentry sees null for 100% of conclusion spans. Add a unit-level assertion thatattributescontains the key whenjobName === "agent", and verify the OTLP exporter is not droppingbuildArrayAttrpayloads. Without this we cannot detect length-truncation or distinguish timeouts from clean exits.release/service.versionon the resource scope.send_otlp_span.cjs:321-323skips emit whenscopeVersionis"unknown". Pass the gh-aw action version (or commit SHA) into setup so release correlation works; otherwise we cannot bisect a regression by deploy.gh-aw.run.statusfor failure queries. Sentry'sspan.statusis null for all conclusion spans even though OTLPstatus.codeis set in the emit payload. Either confirm that Sentry's spans dataset uses a different field (e.g.span.status_code), or documentgh-aw.run.statusas the canonical failure attribute and stop documenting OTLPstatus.codeas queryable.Notes
View notes
list_eventsbut notsearch_eventsorget_trace_details; trace continuity was verified by filteringlist_eventsontrace:<id>and ordering by timestamp.errorsandlogsdatasets returning zero events is an explicit observability finding, not a skipped check — either no SDK is configured to send errors/logs for this project, or no error/log events were emitted in the window.gh-aw.run.status:timeoutandgh-aw.run.status:cancelledboth returned 0 events. The emitter atsend_otlp_span.cjs:1820-1830distinguishes these states; the absence is meaningful (no runs timed out or were cancelled in the window), not a missing attribute.gen_ai.response.finish_reasons:lengthcould not be queried for truncation because the attribute is null everywhere. The check is inconclusive runtime outcome + confirmed instrumentation gap, not a confirmed clean run.References: