Skip to content

Improve opentelemetry collector #794

Draft
weiiwang01 wants to merge 7 commits into
mainfrom
opentelemetry
Draft

Improve opentelemetry collector #794
weiiwang01 wants to merge 7 commits into
mainfrom
opentelemetry

Conversation

@weiiwang01
Copy link
Copy Markdown
Collaborator

What this PR does

Enhanced OpenTelemetry Collector configuration in the pre-job script to properly expose local endpoints for metrics, logs, and traces collection with automatic GitHub workflow context labeling. Also adding system.cpu.logical.count and system.cpu.physical.count metrics.

Why we need it

This addition allows users to more easily upload workload custom metrics, logs, and traces to the action COS. Also, this will help us understand and plan the resource usage for self-hosted runners.

Checklist

  • I followed the contributing guide
  • I added or updated the documentation (if applicable)
  • I updated docs/changelog.md with user-relevant changes
  • I used AI to assist with preparing this PR
  • I added or updated tests as needed (unit and integration)
  • If this is a Grafana dashboard: I added a screenshot of the dashboard
  • If this is Terraform: terraform fmt passes and tflint reports no errors
  • If the github-runner-manager application has been changed: The application version number is updated in github-runner-manager/pyproject.toml.

@weiiwang01 weiiwang01 marked this pull request as draft May 20, 2026 06:15
@cbartz cbartz requested a review from Copilot May 20, 2026 06:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the self-hosted runner’s OpenTelemetry Collector setup generated by the pre-job hook, aiming to accept locally-pushed telemetry (OTLP + Loki) and enrich it with GitHub workflow context labels before exporting to the configured upstream OTLP endpoint.

Changes:

  • Extend the collector config to expose OTLP (gRPC/HTTP) and Loki HTTP receivers and add new metrics/logs/traces pipelines.
  • Add GitHub workflow/run context labeling via attributes/resource processors, plus a Loki-labels transform.
  • Enable system.cpu.logical.count and system.cpu.physical.count hostmetrics.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
github-runner-manager/src/github_runner_manager/templates/pre-job.j2 Expands the generated otelcol config to add OTLP/Loki receivers, new pipelines, and richer GitHub context labels.
docs/changelog.md Adds a changelog entry describing the new local telemetry endpoints and CPU-count metrics.

Comment on lines +160 to +166
endpoint: 0.0.0.0:44317
http:
endpoint: 0.0.0.0:44318
loki:
protocols:
http:
endpoint: 0.0.0.0:43100
otlp/mimir:
endpoint: {{ otel_collector_endpoint }}
otlp/self_hosted_runner:
endpoint: "$ACTION_OTEL_EXPORTER_OTLP_ENDPOINT"
Copy link
Copy Markdown
Collaborator

@cbartz cbartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants