Skip to content

Helioviewer Integration#18

Merged
mudhoney merged 41 commits into
mainfrom
feature/helioviewer-integration
May 12, 2026
Merged

Helioviewer Integration#18
mudhoney merged 41 commits into
mainfrom
feature/helioviewer-integration

Conversation

@mudhoney
Copy link
Copy Markdown
Contributor

@mudhoney mudhoney commented Apr 9, 2026

Summary
Integrates the events-api with Helioviewer.org, adding the Helioviewer-legacy endpoints, a distributions aggregate system, scheduled collection, reprocess tooling, and Sentry error/performance reporting.

Major changes
Helioviewer integration
New /helioviewer/* endpoints via HelioviewerController serving the legacy response shape the helioviewer.org frontend expects (src/Api/Legacy.php).
/helioviewer/events/{source}/observation/{timestamp} — single-observation lookup
POST /helioviewer/events/from/{from}/to/{to} — time-range batch query
POST /helioviewer/distributions/size/{size}/from/{from}/to/{to} — timeline counts
POST /helioviewer/events/{sources}/observations — batch observations with rotation
Batch coordinate rotation per frame to keep request latency bounded.
Distributions
New distributions table + DistributionPostgres repository + Distribution model producing pre-aggregated bucket counts (30m / h / D / W / M / Y) per path.
bin/build-distribution.php + make distribution-build to rebuild from events.
legacy_event_type column (with index) so timeline responses can be filtered by two-letter code (AR, FL, CE, …) without a join.
Distribution updates are wired into the collector so insert/update/remove events keep aggregates consistent in near-real-time.
Scheduler
bin/scheduler.php adds weekly (Mondays 01:00 UTC, last Mon→Sun) and monthly (1st of each month 03:00 UTC, previous calendar month) full-sweep jobs, complementing the existing every-6-minutes and daily-2 AM jobs.
Job completion logs now include duration: Collection completed in {s}s with total {n} events…
Reprocess tooling
bin/reprocess.php replays every event's stored sources/.json through the current processor, rewriting the event row, views/.json and links/.json without re-fetching upstream. Useful whenever processor logic changes.
Extracted the Collector's per-record upsert into Collector::processRawRecord() so it's shared between live collection and reprocess.
Env-var flags to sidestep Make's shell-escape quirks with >> in paths:
make reprocess # dry run, all events
make reprocess APPLY=1 # apply, all events
make reprocess PATHS="CCMC>>Solar Flare Predictions" # filtered dry run
make reprocess PATHS="HEK,HEK>>Flare" APPLY=1 # filtered apply
Sentry integration (errors + performance)
sentry/sentry package + new src/Sentry/ (Client, VoidClient, ClientInterface) registered as 'sentry' in the DI container.
New src/Events/Processors/BaseProcessor.php — all processors extend it and get $this->sentry + $this->logger for free.
Error capture sites:
HTTP errors (Slim default error handler) — tag Type=web
CLI fatal errors in bin/collect.php, bin/scheduler.php, bin/build-distribution.php
Scheduler job failures — tag Type=scheduler, Job=
Per-event InvalidEventException / CoordinateResolutionException in the collector (with EventFailure context pointing at the stored failure JSON)
"No processor found for event" warnings
JSOC NOAA service failures from DaffProcessor
Primary + backup coordinator failures, tagged by tier (primary/backup) and gated by primaryFailed so Sentry fires at most once per request
Performance transactions:
scheduler.every_6_minutes, scheduler.daily_2am, scheduler.weekly_monday_1am_utc, scheduler.monthly_first_3am_utc (op=cron)
cli.collect, cli.build-distribution (op=cli)
API routing
/api/v1/* endpoints are now aliased to /api/v2/* (events, regions, stats) so any persisted v2 link stays resolvable.
Controllers' url / source_url in responses build from APIURL at request time, so they auto-switch if APIURL is changed.
Coordinator hardening
Primary → backup failover now uses a 3-second connection timeout and flips primaryFailed on any error so subsequent batches skip the primary in that request.
New CoordinatorConnectionException distinguishes unreachable (timeout, DNS, refused) from HTTP-error responses.
Event data cleanup
Processors now emit the canonical legacy_type / legacy_pin directly (HEK uses the raw HEK event_type, RHESSI F2→FL, DONKI CME/Flare C3 split into proper CE/FL).
Distributions copy legacy_event_type from the event instead of deriving it via path-prefix lookup — single source of truth.
Removed the redundant src/Api/events_paths_legacy_event_types.php lookup table and Distribution::getLegacyEventType().
Infrastructure
nginx — 5-minute fastcgi_read_timeout for phpfpm (long-running reprocess jobs, large time-range collects).
make root-shell fix, make reprocess added, distribution-build added.
Operational notes
.env.example gained SENTRY_ENABLED, SENTRY_DSN, SENTRY_SAMPLE_RATE, SENTRY_TRACES_SAMPLE_RATE. Copy into .env and set SENTRY_ENABLED=true with a DSN to activate.
Run make migrate-run to apply the two distribution migrations.
First-time data population: make distribution-build (truncates + rebuilds).
To fix persisted v2 URLs in links/*.json, either run the parallel host-side sed or make reprocess APPLY=1.
TODOs (completed on this branch)
Migrate persisted /api/v2/events/{uuid} links in views/links JSONs to /api/v1/ (or rely on the new v2→v1 route alias).
Move EP = SEPs, IC = ICMEs, SR = SIRs into the canonical legacy type map.
Re-pull events for last month where CCMC was intermittently failing

…nt to compact all helioviewer endpoints under same controller
…any paths and also to stabilize unreliable frm inputs
…fails, directly switch to backup coordinator
@mudhoney mudhoney merged commit 827c07a into main May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant