Skip to content

Set PCS staging alert noDataState to OK (fixes perpetual firing)#6581

Merged
premun merged 1 commit into
mainfrom
fix/pcs-staging-nodata-alert
May 25, 2026
Merged

Set PCS staging alert noDataState to OK (fixes perpetual firing)#6581
premun merged 1 commit into
mainfrom
fix/pcs-staging-nodata-alert

Conversation

@missymessa
Copy link
Copy Markdown
Member

Problem

The PCS Background Worker Stopped alert has been firing in staging since March 13 (~10 weeks). The staging PCS environment does not process work items, so \WorkItemExecuted\ telemetry is always empty. With
oDataState: Alerting, the alert fires indefinitely.

Premek's team silenced it manually, but rollouts reverted the silence since it was stored in Grafana's runtime state, not in the provisioned config.

Fix

Change
oDataState\ from \Alerting\ to \OK\ in the Staging alert rule file only.

  • Staging: \OK\ — no alert when there's no telemetry data (expected state)
  • Production: unchanged at \Alerting\ — still alerts if prod stops emitting

This persists across rollouts because it's in the source-of-truth config file.

Related

The staging PCS environment does not process work items, so the
WorkItemExecuted telemetry is always empty. With noDataState set to
Alerting, this causes the alert to fire indefinitely (since March 13).

Changing to OK for staging only — production keeps Alerting so we are
still notified if prod telemetry stops.

Fixes #10774

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@premun premun merged commit 813969e into main May 25, 2026
5 of 6 checks passed
@premun premun deleted the fix/pcs-staging-nodata-alert branch May 25, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants