feat: system backups and alerts pages by andre8244 · Pull Request #101 · NethServer/my

andre8244 · 2026-05-12T15:41:57Z

📋 Description

Refactor System backups page
Refactor Alerts page

Related Issue: #83

🚀 Testing Environment

To trigger a fresh deployment of all services in the PR preview environment, comment:

update deploy

Automatic PR environments:

Proxy: https://my-proxy-qa-pr-101.onrender.com

✅ Merge Checklist

Code Quality:

Builds:

github-actions · 2026-05-12T15:42:21Z

🔗 Redirect URIs Added to Logto

The following redirect URIs have been automatically added to the Logto application configuration:

Redirect URIs:

https://my-proxy-qa-pr-101.onrender.com/login-redirect

Post-logout redirect URIs:

https://my-proxy-qa-pr-101.onrender.com/login

These will be automatically removed when the PR is closed or merged.

edospadoni · 2026-05-20T09:20:06Z

update deploy

github-actions · 2026-05-20T09:20:20Z

🚀 Build triggers updated!

All .render-build-trigger files have been automatically updated to ensure fresh deployments of all services in the PR preview environment.

github-actions · 2026-05-21T12:31:33Z

🚨 Breaking My API change detected

Preview documentation

Structural change details

Modified (5)

GET /filters/alerts
- Response modified: 200
  - Content type modified: application/json
    - Property modified: data
      - [Breaking] Properties removed: systems, severities, organizations
        
        Removing a resource is always breaking unless it was deprecated before [Breaking]
- [Breaking] Query parameters removed: organization_id, include
  - Removing a resource is always breaking unless it was deprecated before [Breaking]
GET /filters/applications
- Response modified: 200
  - Content type modified: application/json
    - Property modified: data
      - [Breaking] Properties removed: systems, organizations
        
        Removing a resource is always breaking unless it was deprecated before [Breaking]
GET /filters/systems
- Response modified: 200
  - Content type modified: application/json
    - Property modified: data
      - [Breaking] Property removed: organizations
        
        Removing a resource is always breaking unless it was deprecated before [Breaking]
GET /filters/users
- Response modified: 200
  - Content type modified: application/json
    - Property modified: data
      - [Breaking] Property removed: organizations
        
        Removing a resource is always breaking unless it was deprecated before [Breaking]
GET /organizations
- Query parameters added: page, page_size, search, name, description, type, created_by

Powered by Bump.sh

Auto-updated .render-build-trigger files to ensure all services are deployed in PR preview environments. 🤖 Generated by GitHub Actions

Builds the operational alerts surface on top of Mimir Alertmanager: a single paginated list endpoint plus per-system silence management, resolved-alert history, and aggregations the UI uses to render the overview page. Endpoints: - GET /alerts (cross-hierarchy / single-tenant / sub-tree scoping, multi-value label filters, sorting on starts_at/severity/alertname, pagination with stable fingerprint tiebreaker) - GET /alerts/history (paginated alert_history rows with date range) - GET /alerts/totals / /trend / /stats (severity buckets, time-series deltas, top-N alertname/system_key, MTTR/MTBF) - GET /alerts/{fingerprint}/activity (silence/unsilence audit timeline, populated transparently by the silence endpoints) - GET /systems/{id}/alerts and friends scoped to a single system Each alert in the list is enriched with a local-DB system object (id/name/type) so the frontend doesn't need a per-row round-trip. Per-tenant fan-out failures are surfaced as warnings rather than failing the whole request. Gated on the existing read:systems / manage:systems permissions: read for the list endpoints, manage for silence create/update/delete.

…GET /alerts Stamp system_type at ingest (collect) alongside the other system_* labels and drop the per-request DB lookup that enriched each alert with a separate system object. Saves a SELECT on every GET /alerts and removes a redundant field the frontend never read.

…UNT shortcut, in-process cache

POST /alerts/config used to 500 on invalid emails (slices lacked binding `dive`). Now binding, semantic, and entity-layer failures all go through response.ValidationFailed with JSON-path keys (email_recipients.2.address) and stable codes. Same envelope on the four silence endpoints. Fix reflect.Value.String() leaking "<int Value>" in response.ParseValidationErrors. OpenAPI: reusable SilenceValidationFailed response + inline examples.

The status-gated loop in db-migrate/db-migrate-qa called run_migration.sh status, which only checks for a schema_migrations row and ignores the recorded checksum. Drift on an already-applied migration was silently skipped. Call apply directly so report_checksum_drift fires, and exit the loop on the first non-zero status.

GET /api/filters/alerts ignored alerts firing in Mimir, so systems and organizations with active-but-unresolved alerts never reached the dropdowns. Fan-out to Mimir alongside alert_history, dedupe by system_key / logto_id, resolve org names in a single unified_organizations lookup. Cache per-scope with singleflight (TTL 15s) mirroring /alerts/totals, and surface per-tenant Mimir failures in a warnings[] field.

The scope-aware aggregation of systems/severities/organizations from alert_history — and the per-tenant Mimir fan-out it grew in 4a37666 — could take 17 s on large hierarchies (and returned thousands of systems anyway, defeating the dropdown). Frontend will populate the systems and organizations dropdowns from /api/systems and /api/organizations; severities are a fixed enum the UI hardcodes.

…nizations pagination /filters/{systems,applications,users} no longer return systems or organizations; those dropdowns are populated by /api/systems and /api/organizations, which support search and pagination and scale past the embedded DISTINCT lists. /filters/systems keeps products, created_by and versions (small, bounded). Dead helpers removed. /organizations was returning broken pages: GetAllOrganizationsPaginated fetched up to pageSize*10 from each org table, then paginated in memory, so on tenants past a few thousand orgs total_count was wrong and pages past the first were truncated. Now: single SQL UNION ALL across the three org tables with RBAC scope, filters, search, ORDER and LIMIT/OFFSET pushed to the database; COUNT(*) for true totals. OpenAPI updated: /filters/{systems,applications,users} response shapes, plus /organizations query parameters (page, page_size, search, name, description, type, created_by) that the handler already accepted.

…ir command Refuse to replay a migration when later siblings are already recorded (was failing on CREATE OR REPLACE VIEW after a later migration converted the object to a MATERIALIZED VIEW). The error now lists the missing numbers and the exact repair command. Add 'repair <num...>' subcommand to run_migration.sh and a matching 'make db-repair MIGRATIONS=...' target.

…fig_layers Migrations 023 (alert_activity) and 024 (alert_config_layers) had been applied but never reflected in schema.sql, leaving the file out of sync with the cumulative DB state. Append the equivalent CREATE TABLE / INDEX / COMMENT statements so a fresh-from-schema install matches a fully-migrated DB.

The /api/alerts/totals endpoint used to fan out to Mimir once per tenant on every request, with bounded concurrency and a 10s timeout. On owner dashboards covering hundreds of tenants this was 21s in QA. Move the fan-out off the user request path: - New alerts_totals_by_org table (migration 025) carries per-org counts by severity and muted state. - New AlertsTotalsRefresher cron in collect refreshes the table every 60s with one Mimir call per tenant (concurrency 50, 30s timeout). Per-tenant failures are aggregated into a single warn line per cycle to avoid log spam when Mimir is down. - GetAlertsTotals now answers with a single SUM scoped by the same resolveOrgScope that gated the old code path. RBAC and hierarchy semantics are unchanged. The history COUNT keeps the bare-COUNT shortcut for the owner-all path. When the freshest row in scope is older than 5 min the response carries a stale-data warning. The fan-out constants (timeout, concurrency) are renamed to mimirFanout* since they're shared with the silences fan-out and were no longer totals-specific. The Redis-less in-process cache and singleflight infrastructure are removed: the new path is fast enough that they add no value. Tested end-to-end against a local Mimir + 3 systems across 3 customers: counts match Mimir's view exactly for owner, reseller subtrees, customer-pinned scopes, and include=descendants drill-downs, before and after resolve/silence events. Endpoint latency ~5ms. (Schema.sql also adds the new alerts_totals_by_org table next to the 023/024 entries.)

…system POST /api/systems/register is public (the secret IS the credential) but until now testers had to curl it by hand after every create-system, otherwise collect would reject the appliance with 401 invalid system credentials. - New `register-system <system_secret>` subcommand completes the handshake against /systems/register and prints the canonical system_key. Skips the OIDC login since the endpoint is unauthed. - New `--register` flag on create-system chains the registration to the create step so a single command yields a system that's immediately usable for pushing alerts through collect.

…lated the DB On a fresh dev environment the boot path is dev-up -> run -> db-migrate. The backend's database.Init() applies schema.sql when it sees no tables, so by the time db-migrate runs the schema is already at head but schema_migrations is empty. Running the migrations from 001 then trips on non-idempotent statements: 010's CREATE OR REPLACE VIEW unified_organizations errors with "is not a view" because 012's MATERIALIZED VIEW already sits on that name. When apply_migration sees an empty schema_migrations alongside an already-populated public schema, treat the DB as freshly baselined from schema.sql and INSERT a row for every on-disk migration without running it. The very first apply call in a make db-migrate run does this; the rest see "already applied" and no-op. Existing DBs and truly empty DBs are unaffected. This relies on the policy that schema.sql is always kept in sync with migrations/ — a new migration whose effect is NOT folded into schema.sql would be silently marked applied on a fresh init. The runner and the migrations README spell out the invariant; the schema_sync project memory enforces it on the workflow side. Also fixes a stdin-leak in the baseline loop: podman run -i consumed the while-read pipe, so only the first migration was inserted. Added </dev/null on the inner call.

The /alerts/totals endpoint's history figure is a dashboard counter ('Alerts in history'), and exact accuracy is invisible to the user. On QA the table holds 352k rows and an unconditional COUNT(*) takes ~4.8s — an index-only scan over a stale visibility map (100k+ heap fetches). Combined with Render's network latency and post-deploy cold pool warmup, the endpoint hit the 30s gateway timeout on the first few calls even after we'd already moved the active counts off Mimir. Replace the owner-all-scope branch with pg_class.reltuples — the planner's row estimate maintained by autovacuum. Sub-millisecond (0.06ms vs 4810ms in QA), lags reality by at most the autovacuum interval (measured ~9% on QA between checkpoints), and is exactly what this counter is for. Scoped queries (single tenant / IN-list) keep their exact COUNT(*) — those are bounded by the index on organization_id and don't suffer the same pathology. If reltuples is negative (table never analyzed), fall back to the exact COUNT(*) so fresh installs still render a sensible number on the first hit.

edospadoni deployed to alerts-and-backup-ui - my-backend-qa PR #101 May 12, 2026 15:42 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-collect-qa PR #101 May 12, 2026 15:42 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-frontend-qa PR #101 May 12, 2026 15:42 — with Render Active

andre8244 changed the base branch from main to feat/alerts-config-refactor May 12, 2026 15:44

andre8244 force-pushed the alerts-and-backup-ui branch from 80211a6 to f7d17d2 Compare May 12, 2026 15:45

edospadoni deployed to alerts-and-backup-ui - my-frontend-qa PR #101 May 12, 2026 15:45 — with Render Active

andre8244 force-pushed the alerts-and-backup-ui branch from f7d17d2 to a9c986f Compare May 13, 2026 11:52

edospadoni deployed to alerts-and-backup-ui - my-frontend-qa PR #101 May 13, 2026 11:52 — with Render Active

edospadoni force-pushed the feat/alerts-config-refactor branch from 74168a3 to 08a07dd Compare May 13, 2026 13:42

andre8244 force-pushed the alerts-and-backup-ui branch from a9c986f to eff1f04 Compare May 15, 2026 16:03

edospadoni deployed to alerts-and-backup-ui - my-frontend-qa PR #101 May 15, 2026 16:03 — with Render Active

andre8244 changed the title ~~refactor: system backups page~~ feat: system backups and alerts pages May 15, 2026

andre8244 self-assigned this May 15, 2026

edospadoni force-pushed the feat/alerts-config-refactor branch from feea17d to c4c7f2b Compare May 20, 2026 08:04

Base automatically changed from feat/alerts-config-refactor to main May 20, 2026 08:05

edospadoni force-pushed the alerts-and-backup-ui branch from eff1f04 to 9d111a7 Compare May 20, 2026 09:16

edospadoni deployed to alerts-and-backup-ui - my-frontend-qa PR #101 May 20, 2026 09:16 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-frontend-qa PR #101 May 20, 2026 09:20 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-backend-qa PR #101 May 20, 2026 09:20 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-collect-qa PR #101 May 20, 2026 09:20 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-mimir-qa PR #101 May 20, 2026 09:20 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-proxy-qa PR #101 May 20, 2026 09:20 — with Render View deployment

edospadoni deployed to alerts-and-backup-ui - my-backend-qa PR #101 May 20, 2026 10:14 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-backend-qa PR #101 May 20, 2026 15:37 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-backend-qa PR #101 May 21, 2026 12:31 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-frontend-qa PR #101 May 21, 2026 15:19 — with Render Active

andre8244 and others added 24 commits May 26, 2026 18:05

chore(agents): improve Frontend & Accessibility agent

d732c9c

feat: add alerts page (wip)

fc761c7

chore: update build triggers for PR deployment

17521ad

Auto-updated .render-build-trigger files to ensure all services are deployed in PR preview environments. 🤖 Generated by GitHub Actions

perf(alerts): speed up /alerts/totals with HTTP pool tuning, owner CO…

dbb0dac

…UNT shortcut, in-process cache

feat: add alert notifications configuration (wip)

72c9266

feat: add alert notifications configuration (wip)

bba2e32

fix: frontend agent configuration for figma mcp

82c0b9b

feat: add alert notifications configuration (wip)

ab475b6

feat: add alert notifications configuration (wip)

8e6e0f9

fix(makefile): restart existing stopped containers in db-up/redis-up

4a5907b

fix: improve company and systems filter

f895e38

andre8244 force-pushed the alerts-and-backup-ui branch from 661c1ea to f895e38 Compare May 26, 2026 16:05

edospadoni deployed to alerts-and-backup-ui - my-backend-qa PR #101 May 26, 2026 16:05 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-collect-qa PR #101 May 26, 2026 16:05 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-mimir-qa PR #101 May 26, 2026 16:05 — with Render Active

edospadoni deployed to alerts-and-backup-ui - my-proxy-qa PR #101 May 26, 2026 16:05 — with Render View deployment

edospadoni deployed to alerts-and-backup-ui - my-frontend-qa PR #101 May 26, 2026 16:05 — with Render Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: system backups and alerts pages#101

feat: system backups and alerts pages#101
andre8244 wants to merge 25 commits into
mainfrom
alerts-and-backup-ui

andre8244 commented May 12, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

edospadoni commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Modified (5)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andre8244 commented May 12, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Description

🚀 Testing Environment

✅ Merge Checklist

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

edospadoni commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚨 Breaking My API change detected

Modified (5)

Powered by Bump.sh

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andre8244 commented May 12, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading