diff --git a/CLAUDE.md b/CLAUDE.md index 2c69a5d..d3e1eb4 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -45,6 +45,7 @@ All under `/v1/`: | Endpoint | Purpose | |----------|---------| +| `GET /` | Greeting JSON `{name, docs, api}` so bare-hostname hits don't 404. Cached `max-age=3600, s-maxage=86400`. | | `GET /health` | Health check (Postgres + Meilisearch status) | | `GET /search?q=&platform=&sort=&limit=&offset=` | Meilisearch-powered search. Auto-triggers GitHub passthrough if <5 results. `sort` ∈ {`relevance` (default), `stars`, `recent` / `releases` (alias, by latest stable release date), `updated` (by repo `updated_at_gh`)}. `relevance` requires `q`; the others allow empty `q` for browse-mode listings. `sort=updated` is routed directly to Postgres FTS until the fetcher repo's `meili_sync.py` adds `updated_at_gh` to Meili's sortable-attributes. Reads optional `X-GitHub-Token` header to run passthrough on the user's 5000/hr quota instead of the backend's fallback quota. Response carries `passthroughAttempted: Boolean` so clients can distinguish "index was warm but returned nothing" from "GitHub also has nothing". | | `GET /search/explore?q=&platform=&page=` | User-triggered deep GitHub search, paginated, ingests into index. Also reads `X-GitHub-Token`. Cold-path latency is 10–30s — clients must use a 30s timeout. | @@ -57,13 +58,15 @@ All under `/v1/`: | `GET /user/{username}` | Proxied GitHub user/org profile. Reads optional `X-GitHub-Token`. Cached 7d. | | `GET /users/{username}/repos?type=&sort=&direction=&page=&per_page=` | Proxied list of a user/org's repos. `type` ∈ {all, owner, member}, `sort` ∈ {created, updated, pushed, full_name}, `direction` ∈ {asc, desc}. Whitelisted to block SSRF via query injection. Cached 1h server-side, edge `s-maxage=1800`. Reads `X-GitHub-Token`. | | `GET /users/{username}/starred?sort=&direction=&page=&per_page=` | Proxied list of a user's starred repos (the public form -- the OAuth viewer-self form is intentionally NOT proxied). `sort` ∈ {created, updated}. Cached 30min server-side, edge `s-maxage=900`. Reads `X-GitHub-Token`. | -| `POST /events` | Batched telemetry (opt-in, max 50 per batch). These rows drive `SignalAggregationWorker` — ranking only improves if clients send events. | +| `POST /events` | **Deprecated 2026-04-26 — telemetry was killed in the E6 audit.** Returns `204 No Content` and silently discards the batch — pre-1.8.3 clients (`TelemetryRepositoryImpl`) treat any non-2xx as failure and retry, so a 410 here triggers an error-log + retry storm. 204 lets old clients see success and back off. The `Events` table and `SignalAggregationWorker` remain wired for historical rows but no new data is ingested. Once 1.8.3 ships a sticky-disable-on-non-2xx flag on the client (telemetry cleanup task), flip this back to `410 Gone` with the proper JSON deprecation notice so laggard clients get a real signal. | | `GET /announcements` | Public, anonymous announcements feed. Same byte-identical envelope for every caller. Backed by JSON files in `src/main/resources/announcements/.json` (or `ANNOUNCEMENTS_DIR` env override). Validator enforces every rule from `docs/backend/announcements-endpoint.md` §2 at load time; expired items are filtered at serve time. `Cache-Control: public, max-age=600` + ETag revalidation. No auth, no per-user logic, no logging beyond standard access. | | `POST /auth/device/start` | Stateless proxy for `github.com/login/device/code`. Client used to call GitHub directly; some user networks (documented in OpenHub-Store/GitHub-Store#433, #395) can't reach GitHub reliably. Backend adds `client_id`, forwards GitHub's body verbatim. 10 req/hr/IP. | -| `POST /auth/device/poll` | Stateless proxy for `github.com/login/oauth/access_token`. Reads `device_code` from form body, adds `client_id` + `grant_type`, forwards GitHub's body verbatim (including tokens on success). The backend never logs, caches, or persists the token. 200 req/hr/IP. | +| `POST /auth/device/poll` | Stateless proxy for `github.com/login/oauth/access_token`. Reads `device_code` from form body, adds `client_id` + `grant_type`, forwards GitHub's body verbatim (including tokens on success). The backend never logs, caches, or persists the token. 200 req/hr/IP. Per-request diagnostic line `[auth-poll rid=… ] dch= ghs= gh_err= lat_ms= ua=` is logged for auth-stuck triage (GitHub-Store#433, #395) — the raw `device_code`, response body, and every token field are explicitly excluded; only the `error` key is parsed off the upstream body, via a DTO that doesn't even declare `access_token`/`refresh_token`. | | `GET /internal/metrics` | Operator-only. Gated by `X-Admin-Token` matching the `ADMIN_TOKEN` env var (open if unset, for local dev). Returns per-source search counters, P-latency, worker queue depth, and top 20 misses (8-char `query_hash` prefix only) in last 7 days. | | `POST /internal/backfill-stale?limit=N` | Operator-only. Spawns a paced background job that refreshes every curated row whose new metadata columns are still at their migration defaults (currently keyed on `license_spdx_id IS NULL`). One concurrent run; returns 409 on re-trigger. Uses `searchClient.refreshRepo` + persist; respects the quiet window so the daily fetcher's pool stays free. Run after a column-add deploy; no-ops afterwards once the filter no longer matches. | | `GET /badge/...` | M3-styled SVG badges. Per-repo: `/badge/{owner}/{name}/{kind}/{style}/{variant}` for kind ∈ {release, stars, downloads}. Global: `/badge/{kind}/{style}/{variant}` for kind ∈ {users, fdroid}. Static: `/badge/static/{style}/{variant}?label=&icon=`. Style 1-12 hue, variant 1-3 shade. Vectorized glyph rendering — no font dependency at SVG embed time. | +| `GET /mirrors/list` | Curated catalog of GitHub mirrors with hourly-probed health. Each entry carries `traffic_kinds: ["release_asset", "raw_file"]` for whole-URL proxies (template ends `/{url}`) and `["raw_file"]` for jsDelivr's `/gh/` path-based mirror (template `https://fastly.jsdelivr.net/gh/{owner}/{repo}@{ref}/{path}`). Clients MUST consult `traffic_kinds` before routing a download — sending a release-asset URL through a `raw_file`-only mirror 404s. Cached `max-age=300, s-maxage=3600`. | +| `{GET,POST} /repo/login/{device,oauth}` | **Deprecated 2024-09-01** — tombstone for pre-1.6 builds that wired the device-flow under `/repo/`. Returns `410 Gone` with `use_instead` pointing at `/v1/auth/device/start` + `/v1/auth/device/poll`. Cached `max-age=86400`. Declared **before** `repoRoutes` so the static segments win over `/repo/{owner}/{name}`. | Client-facing API contract and migration history live in `internal/` (gitignored, operator-only). The client repo at `OpenHub-Store/GitHub-Store` is the public source of truth for client behavior. @@ -113,7 +116,7 @@ RepoRefreshWorker (hourly) — re-fetches passthrough repos by oldest indexed_a - **Meilisearch partial-update gotcha — PUT, never POST.** `MeilisearchClient.addDocuments()` is POST, which on Meili *replaces* the document with whatever fields you send (everything else becomes null). `MeilisearchClient.updateScores()` is PUT, which merges. Pushing just `{id, search_score}` with POST will wipe every other field on 3000+ docs. If you add a new "partial update" path, verify the HTTP verb before deploying. - **Dynamic category/topic ordering.** `RepoRepository.findByCategory()` picks a category-specific primary sort column (`trending_score` for trending, `popularity_score` for most-popular, `latest_release_date` for new-releases), falls back to global `searchScore`, then static `rank` as final tie-breaker. Without category-specific primary, both trending and most-popular collapse onto the same global score — the bug fix in PR #12. `findByTopicBucket()` keeps the simpler `searchScore DESC NULLS LAST, rank ASC` order because topics are flat lists, not flavour-segmented like the categories. - **Exposed `Repos` table uses `array("topics", TextColumnType())`** for the Postgres `TEXT[]` column. The Python fetcher writes these via psycopg2's automatic list-to-array conversion. -- **Cache headers are set per endpoint**, not globally. Announcements: 600s/3600s. Categories/topics: 60s/600s. Repo detail: 30s/300s. Search: 15s/30s. Readme proxy: 3600s/21600s. User proxy: 86400s/604800s. Badges (fresh): 3600s/3600s with `stale-while-revalidate=86400`; (degraded) 300s/300s. Edge respects `s-maxage`; the larger `s-maxage` lets Gcore's shield/tiered cache topology absorb origin load while browsers stay fresher via the smaller `max-age`. `/internal/metrics` is uncached. +- **Cache headers are set per endpoint**, not globally. Announcements: 600s/3600s. Categories/topics: 60s/600s. Repo detail: 30s/300s. Search: 15s/30s. Readme proxy: 3600s/21600s. User proxy: 86400s/604800s. Signing-seeds: 86400s/604800s with `stale-while-revalidate=86400` and a strong ETag for 304 revalidation — content only changes on the daily F-Droid sync cron, so the long edge TTL is paired with an operator-side Cloudflare purge when the seeds rotate. Badges (fresh): 3600s/3600s with `stale-while-revalidate=86400`; (degraded) 300s/300s. Unmatched-route 404s: 300s/300s — `Plugins.kt:respondNotFound` returns `ApiError("not_found")` and logs `[404 rid=… ] METHOD /path` (no query string) so scanner traffic and old-client paths are classifiable from the application log without hammering origin. Edge respects `s-maxage`; the larger `s-maxage` lets Gcore's shield/tiered cache topology absorb origin load while browsers stay fresher via the smaller `max-age`. `/internal/metrics` is uncached. - **HEAD routes to GET** via the `AutoHeadResponse` plugin (`Plugins.kt`). Without it, Ktor 3 returns 404 for HEAD on every GET handler — confusing for `curl -I`, monitoring, and CDN origin probes. - **Owner / repo-name path-param validation.** Every GitHub-proxy route (`/readme/`, `/user/`, `/release/`, `/repo/`, `/badge/{owner}/{name}/...`) calls `util/GitHubIdentifiers.validOwner` / `validName` at the top of the handler. Owner regex matches GitHub's actual username rule (`^[A-Za-z0-9](?:[A-Za-z0-9-]{0,38})$`), name allows a slightly broader set up to 100 chars. Reject early with 400 — keeps SSRF-by-path-trickery off the upstream URL. - **Badge service** lives under `badge/` (`BadgeRenderer`, `BadgeColors`, `BadgeIcons`, `BadgeService`, `FdroidVersionClient`, `TtlCache`, `BadgeGlyphs`). Text is rendered as vectorized `` elements composed from glyph data extracted at startup from `src/main/resources/fonts/Inter-Bold.ttf` (SIL OFL 1.1). The renderer is deliberately font-independent at SVG embed time — every browser, markdown viewer, and feed reader sees byte-identical glyphs. Color palette mirrors `ziadOUA/m3-Markdown-Badges` hex-for-hex (12 hues × 3 shade variants). diff --git a/src/main/kotlin/zed/rainxch/githubstore/Plugins.kt b/src/main/kotlin/zed/rainxch/githubstore/Plugins.kt index c320293..84e38d6 100644 --- a/src/main/kotlin/zed/rainxch/githubstore/Plugins.kt +++ b/src/main/kotlin/zed/rainxch/githubstore/Plugins.kt @@ -40,7 +40,7 @@ fun Application.configureSerialization() { } } -private val REQUEST_ID_KEY = AttributeKey("RequestId") +internal val REQUEST_ID_KEY = AttributeKey("RequestId") private val REQUEST_ID_PATTERN = Regex("^[A-Za-z0-9\\-]{1,64}$") // Reject oversized or unknown-size bodies before reading them. @@ -102,6 +102,29 @@ private fun searchBucketKey(call: io.ktor.server.application.ApplicationCall): S } } +// Shared 404 responder. Logs the unmatched method + path (NOT the query +// string — query can carry user search terms), sets a short edge cache so +// scanners and broken clients can't pin origin, and returns the same JSON +// shape every other 4xx uses. Path is bracketed so `grep '\[404 ...]'` finds +// only 404 lines on a noisy log. +// +// Called by: +// - The global `status(NotFound)` StatusPages handler (unmatched routes +// and any route-level 404 — Ktor 3's StatusPages overrides route-level +// bodies, see StatusPagesOverrideTest). +// - Routes that want the same body shape + caching + log without going +// through StatusPages (`InternalRoutes`). +internal suspend fun respondNotFound(call: io.ktor.server.application.ApplicationCall) { + val rid = call.attributes.getOrNull(REQUEST_ID_KEY) + val method = call.request.httpMethod.value + val path = call.request.path() + call.application.environment.log.info( + "[404 rid={}] {} {}", rid ?: "-", method, path, + ) + call.response.header(HttpHeaders.CacheControl, "public, max-age=300, s-maxage=300") + call.respond(HttpStatusCode.NotFound, ApiError("not_found")) +} + fun Application.configureHTTP() { install(DefaultHeaders) { header("X-Engine", "github-store-backend") @@ -125,8 +148,9 @@ fun Application.configureHTTP() { // CORS is only useful for browser-based callers. The KMP client never sends // Origin (native HttpClient), so this only affects the admin dashboard (same // origin as the API — doesn't need CORS) and any future web surface. Pinning - // to our own domains removes a CSRF foothold on /v1/events from malicious - // third-party pages without breaking anything we actually serve. + // to our own domains removes a CSRF foothold on state-changing POSTs (e.g. + // /v1/repo/{owner}/{name}/refresh) from malicious third-party pages without + // breaking anything we actually serve. install(CORS) { allowHost("github-store.org", subDomains = listOf("api", "api-direct", "www")) // localhost dev origins are only useful when developing the admin @@ -190,13 +214,6 @@ fun Application.configureHTTP() { rateLimiter(limit = 360, refillPeriod = 1.minutes) requestKey(::forwardedFor) } - // Events endpoint: 3/min/IP (tightened 10× for direct-path abuse). - // 50 events/batch × 3 batches/min = 150 events/min/IP — comfortably - // covers any realistic session. - register(RateLimitName("events")) { - rateLimiter(limit = 3, refillPeriod = 1.minutes) - requestKey(::forwardedFor) - } // Search bucket: 240/min/key. Covers /search, /search/explore, // /releases, /readme, /user, /users/{u}/repos, /users/{u}/starred -- // every route that fans out to the GitHub API. Keyed by token-hash @@ -317,7 +334,17 @@ fun Application.configureHTTP() { call.respond(HttpStatusCode.BadRequest, ApiError("invalid_request")) } exception { call, _ -> - call.respond(HttpStatusCode.NotFound, ApiError("not_found")) + respondNotFound(call) + } + // Catch every unmatched-route 404 (Ktor's default response has no body + // and no Cache-Control). One handler gives us: + // - consistent JSON shape ({error, message}) across the API + // - structured access log entry with method + path (no query) so we + // can classify scanner traffic vs old-client paths from Cloudflare + // analytics + the application log + // - short edge cache so repeat scanner hits don't slam origin + status(HttpStatusCode.NotFound) { call, _ -> + respondNotFound(call) } // 429s come out of the RateLimit plugin with Retry-After but an empty // body. Replace that with a JSON body the client can parse + display diff --git a/src/main/kotlin/zed/rainxch/githubstore/ingest/GitHubDeviceClient.kt b/src/main/kotlin/zed/rainxch/githubstore/ingest/GitHubDeviceClient.kt index 9e6b3f7..751dec1 100644 --- a/src/main/kotlin/zed/rainxch/githubstore/ingest/GitHubDeviceClient.kt +++ b/src/main/kotlin/zed/rainxch/githubstore/ingest/GitHubDeviceClient.kt @@ -9,15 +9,20 @@ import io.ktor.client.statement.* import io.ktor.http.* import org.slf4j.LoggerFactory -class GitHubDeviceClient { - private val log = LoggerFactory.getLogger(GitHubDeviceClient::class.java) - +// `open` so route tests can swap in a fake client that returns canned +// GitHubDeviceResponse values without touching real HTTP. `clientId` is a +// constructor parameter (defaulted to the env var) for the same reason — +// tests don't need to set GITHUB_OAUTH_CLIENT_ID just to construct an +// override. +open class GitHubDeviceClient( private val clientId: String = System.getenv("GITHUB_OAUTH_CLIENT_ID")?.takeIf { it.isNotBlank() } ?: error( "GITHUB_OAUTH_CLIENT_ID env var is required to serve /v1/auth/device/* routes. " + "Set it to the same OAuth App client_id the KMP client has in BuildKonfig." - ) + ), +) { + private val log = LoggerFactory.getLogger(GitHubDeviceClient::class.java) private val http = HttpClient(CIO) { install(HttpTimeout) { @@ -30,12 +35,12 @@ class GitHubDeviceClient { expectSuccess = false } - suspend fun startDeviceFlow(): GitHubDeviceResponse = + open suspend fun startDeviceFlow(): GitHubDeviceResponse = proxyCall("https://github.com/login/device/code") { append("client_id", clientId) } - suspend fun pollDeviceToken(deviceCode: String): GitHubDeviceResponse = + open suspend fun pollDeviceToken(deviceCode: String): GitHubDeviceResponse = proxyCall("https://github.com/login/oauth/access_token") { append("client_id", clientId) append("device_code", deviceCode) diff --git a/src/main/kotlin/zed/rainxch/githubstore/mirrors/Mirror.kt b/src/main/kotlin/zed/rainxch/githubstore/mirrors/Mirror.kt index 504e413..7663895 100644 --- a/src/main/kotlin/zed/rainxch/githubstore/mirrors/Mirror.kt +++ b/src/main/kotlin/zed/rainxch/githubstore/mirrors/Mirror.kt @@ -22,12 +22,22 @@ enum class MirrorStatus { // known-stable release-asset checksum file at cli/cli@v2.40.0 -- pinned // release means the URL won't 404 on us. Range: bytes=0-0 keeps actual // transfer at 1 byte regardless of whether the mirror honors the Range header. +// +// `trafficKinds` tells the client what kinds of traffic this mirror is fit +// for, since not every mirror covers every kind. Two kinds matter today: +// - "release_asset": URLs under github.com/.../releases/download/... +// - "raw_file" : repo source-tree files (READMEs, icons, raw.githubusercontent +// content, anything served by jsDelivr's `/gh/` path) +// Whole-URL proxies serve both. jsDelivr's `/gh/` endpoint serves repo files +// only — sending a release-asset URL there 404s. The client MUST inspect this +// list before routing a download. data class MirrorPreset( val id: String, val name: String, val urlTemplate: String?, val type: MirrorType, val pingUrl: String, + val trafficKinds: List, ) // Hardcoded catalog. Adding/removing a mirror is a code change + deploy -- @@ -42,6 +52,8 @@ object MirrorPresets { private const val PROBE_ASSET = "https://github.com/cli/cli/releases/download/v2.40.0/gh_2.40.0_checksums.txt" + private val FULL_PROXY_KINDS = listOf("release_asset", "raw_file") + val ALL: List = listOf( MirrorPreset( id = "direct", @@ -49,6 +61,7 @@ object MirrorPresets { urlTemplate = null, type = MirrorType.OFFICIAL, pingUrl = "https://api.github.com/zen", + trafficKinds = FULL_PROXY_KINDS, ), MirrorPreset( id = "ghfast_top", @@ -56,6 +69,7 @@ object MirrorPresets { urlTemplate = "https://ghfast.top/{url}", type = MirrorType.COMMUNITY, pingUrl = "https://ghfast.top/$PROBE_ASSET", + trafficKinds = FULL_PROXY_KINDS, ), MirrorPreset( id = "moeyy_xyz", @@ -63,6 +77,7 @@ object MirrorPresets { urlTemplate = "https://github.moeyy.xyz/{url}", type = MirrorType.COMMUNITY, pingUrl = "https://github.moeyy.xyz/$PROBE_ASSET", + trafficKinds = FULL_PROXY_KINDS, ), MirrorPreset( id = "gh_proxy_com", @@ -70,6 +85,7 @@ object MirrorPresets { urlTemplate = "https://gh-proxy.com/{url}", type = MirrorType.COMMUNITY, pingUrl = "https://gh-proxy.com/$PROBE_ASSET", + trafficKinds = FULL_PROXY_KINDS, ), MirrorPreset( id = "ghps_cc", @@ -77,6 +93,7 @@ object MirrorPresets { urlTemplate = "https://ghps.cc/{url}", type = MirrorType.COMMUNITY, pingUrl = "https://ghps.cc/$PROBE_ASSET", + trafficKinds = FULL_PROXY_KINDS, ), MirrorPreset( id = "gh_99988866_xyz", @@ -84,6 +101,30 @@ object MirrorPresets { urlTemplate = "https://gh.api.99988866.xyz/{url}", type = MirrorType.COMMUNITY, pingUrl = "https://gh.api.99988866.xyz/$PROBE_ASSET", + trafficKinds = FULL_PROXY_KINDS, + ), + // jsDelivr's Fastly-fronted endpoint. jsDelivr publishes per-CDN + // entrypoints (cdn.jsdelivr.net is multi-CDN, fastly.jsdelivr.net is + // Fastly-only, gcore.jsdelivr.net is Gcore); the Fastly one is the + // commonly-used escape hatch when Cloudflare paths are blocked in + // Mainland China. urlTemplate uses jsDelivr's native `/gh/` path + // shape — the client dispatches by inspecting placeholders. Marked + // raw-file-only because jsDelivr does NOT serve release assets + // (release tarballs aren't under /gh/). + // + // Sprint 3 Task #13 originally requested fastgit.cc alongside this, + // but fastgit.cc could not be verified as a legitimate successor of + // the (now-defunct) fastgit.org project — no public artifact ties + // the .cc TLD to the FastGitORG team — so it is intentionally NOT + // added here. Re-add only with operator sign-off after lineage is + // confirmed. + MirrorPreset( + id = "fastly_jsdelivr", + name = "fastly.jsdelivr.net", + urlTemplate = "https://fastly.jsdelivr.net/gh/{owner}/{repo}@{ref}/{path}", + type = MirrorType.COMMUNITY, + pingUrl = "https://fastly.jsdelivr.net/gh/cli/cli@v2.40.0/LICENSE", + trafficKinds = listOf("raw_file"), ), ) @@ -105,4 +146,8 @@ data class MirrorEntry( val status: MirrorStatus, @SerialName("latency_ms") val latencyMs: Long?, @SerialName("last_checked_at") val lastCheckedAt: String?, + // `traffic_kinds` is additive — pre-1.8.3 clients ignore the unknown field + // and continue to use whole-URL-proxy mirrors against any github.com URL. + // 1.8.3+ clients MUST consult this list before routing a download. + @SerialName("traffic_kinds") val trafficKinds: List, ) diff --git a/src/main/kotlin/zed/rainxch/githubstore/routes/AuthRoutes.kt b/src/main/kotlin/zed/rainxch/githubstore/routes/AuthRoutes.kt index 9cae051..bde8876 100644 --- a/src/main/kotlin/zed/rainxch/githubstore/routes/AuthRoutes.kt +++ b/src/main/kotlin/zed/rainxch/githubstore/routes/AuthRoutes.kt @@ -5,16 +5,31 @@ import io.ktor.server.plugins.ratelimit.* import io.ktor.server.request.* import io.ktor.server.response.* import io.ktor.server.routing.* +import kotlinx.serialization.Serializable +import kotlinx.serialization.json.Json import org.slf4j.LoggerFactory +import zed.rainxch.githubstore.REQUEST_ID_KEY import zed.rainxch.githubstore.ingest.GitHubDeviceClient +import zed.rainxch.githubstore.ingest.GitHubDeviceResponse import zed.rainxch.githubstore.requireMaxBody import zed.rainxch.githubstore.util.ApiError +import zed.rainxch.githubstore.util.PrivacyHash private val log = LoggerFactory.getLogger("AuthRoutes") private const val START_MAX_BODY = 1L * 1024 private const val POLL_MAX_BODY = 4L * 1024 +// Used only to extract the `error` field from device-flow error-shaped 200 +// responses (`{"error":"authorization_pending"}` etc). The `access_token` and +// `refresh_token` fields on success responses are intentionally absent from +// this DTO so they can NEVER end up in the deserialized object — even if we +// accidentally log it, there is nothing sensitive to leak. +@Serializable +private data class DeviceErrorProbe(val error: String? = null) + +private val errorProbeJson = Json { ignoreUnknownKeys = true; isLenient = true } + fun Route.authRoutes(deviceClient: GitHubDeviceClient) { route("/auth/device") { rateLimit(RateLimitName("auth-start")) { @@ -59,32 +74,83 @@ fun Route.authRoutes(deviceClient: GitHubDeviceClient) { ApiError("missing_device_code"), ) - try { - val result = deviceClient.pollDeviceToken(deviceCode) - // Device-flow pending/error states (authorization_pending, - // slow_down, access_denied, expired_token, ...) arrive from - // GitHub as HTTP 200 with an error-shaped body. Forward - // 200→200 verbatim; the client already string-matches these. - // Only non-2xx flips to 502 so the client's infrastructure - // fallback predicate fires cleanly. - val outStatus = if (result.status.isSuccess()) { - HttpStatusCode.OK - } else { - HttpStatusCode.BadGateway - } - call.respondText( - text = result.body, - contentType = ContentType.Application.Json, - status = outStatus, - ) + // Diagnostics for the auth-stuck reports (GitHub-Store#433, #395 + // and the Sprint 3 Task #8 user-survey reports). We log exactly + // the metadata needed to correlate a user-reported failed flow + // with the backend's view: a stable hash of device_code (so the + // user can paste the 16-char prefix and we can grep logs), the + // upstream HTTP status, GitHub's `error` code if the body was + // an error-shaped 200, latency, and the client UA. Never the + // raw device_code, never the upstream body, never any token + // field. See CLAUDE.md: "The backend must never log the access + // token returned by a successful poll". + val deviceCodeHash = PrivacyHash.hash(deviceCode).take(16) + val userAgent = call.request.headers[HttpHeaders.UserAgent] + ?.replace('\n', ' ') + ?.replace('\r', ' ') + ?.take(120) + ?: "-" + val rid = call.attributes.getOrNull(REQUEST_ID_KEY) ?: "-" + val start = System.currentTimeMillis() + + val result: GitHubDeviceResponse = try { + deviceClient.pollDeviceToken(deviceCode) } catch (e: Exception) { + val latency = System.currentTimeMillis() - start + log.info( + "[auth-poll rid={}] dch={} ghs=- gh_err=upstream_exception lat_ms={} ua={}", + rid, deviceCodeHash, latency, userAgent, + ) log.warn("auth/device/poll upstream error: {}", e.message) - call.respond( + return@post call.respond( HttpStatusCode.BadGateway, ApiError("github_unreachable"), ) } + val latency = System.currentTimeMillis() - start + + val githubErrorCode = parseErrorCode(result.body) + log.info( + "[auth-poll rid={}] dch={} ghs={} gh_err={} lat_ms={} ua={}", + rid, + deviceCodeHash, + result.status.value, + githubErrorCode ?: "-", + latency, + userAgent, + ) + + // Device-flow pending/error states (authorization_pending, + // slow_down, access_denied, expired_token, ...) arrive from + // GitHub as HTTP 200 with an error-shaped body. Forward + // 200→200 verbatim; the client already string-matches these. + // Only non-2xx flips to 502 so the client's infrastructure + // fallback predicate fires cleanly. + val outStatus = if (result.status.isSuccess()) { + HttpStatusCode.OK + } else { + HttpStatusCode.BadGateway + } + call.respondText( + text = result.body, + contentType = ContentType.Application.Json, + status = outStatus, + ) } } } } + +// Parses the `error` field out of a device-flow response. Returns null when +// the body isn't JSON, doesn't contain an `error` field, or any other parse +// failure. Crucially, only DeviceErrorProbe is deserialized — token fields +// from a successful response are never materialised in JVM memory beyond the +// raw `result.body` string we forward verbatim to the client. +private fun parseErrorCode(body: String): String? { + if (body.isBlank()) return null + return try { + errorProbeJson.decodeFromString(DeviceErrorProbe.serializer(), body).error + } catch (_: Exception) { + null + } +} diff --git a/src/main/kotlin/zed/rainxch/githubstore/routes/BadgeRoutes.kt b/src/main/kotlin/zed/rainxch/githubstore/routes/BadgeRoutes.kt index fb905d0..988f9b6 100644 --- a/src/main/kotlin/zed/rainxch/githubstore/routes/BadgeRoutes.kt +++ b/src/main/kotlin/zed/rainxch/githubstore/routes/BadgeRoutes.kt @@ -57,7 +57,12 @@ fun Route.badgeRoutes(badgeService: BadgeService) { val height = parseHeight(call.request.queryParameters["height"]) val rendered = badgeService.renderRepoBadge(owner, name, kind, styleIndex, variant, labelOverride, height) - ?: return@get call.respond(HttpStatusCode.NotFound, mapOf("error" to "unknown kind: $kind")) + // `kind` is the URL path segment; an unknown value is an + // input-validation error, not a missing resource. 400 also + // sidesteps the global StatusPages NotFound handler that + // would otherwise overwrite this diagnostic body with the + // generic `not_found` envelope (see StatusPagesOverrideTest). + ?: return@get call.respond(HttpStatusCode.BadRequest, mapOf("error" to "unknown kind: $kind")) respondSvg(rendered.svg, degraded = rendered.degraded) } @@ -77,7 +82,10 @@ fun Route.badgeRoutes(badgeService: BadgeService) { val height = parseHeight(call.request.queryParameters["height"]) val rendered = badgeService.renderGlobalBadge(kind, styleIndex, variant, labelOverride, height) - ?: return@get call.respond(HttpStatusCode.NotFound, mapOf("error" to "unknown kind: $kind (global kinds are users, fdroid; use /v1/badge/{owner}/{name}/{kind}/... for per-repo)")) + // Same rationale as the per-repo route above: `kind` is + // input, not a missing resource; 400 keeps the diagnostic + // body intact against the global NotFound handler. + ?: return@get call.respond(HttpStatusCode.BadRequest, mapOf("error" to "unknown kind: $kind (global kinds are users, fdroid; use /v1/badge/{owner}/{name}/{kind}/... for per-repo)")) respondSvg(rendered.svg, degraded = rendered.degraded) } diff --git a/src/main/kotlin/zed/rainxch/githubstore/routes/DeprecationRoutes.kt b/src/main/kotlin/zed/rainxch/githubstore/routes/DeprecationRoutes.kt new file mode 100644 index 0000000..e836e30 --- /dev/null +++ b/src/main/kotlin/zed/rainxch/githubstore/routes/DeprecationRoutes.kt @@ -0,0 +1,43 @@ +package zed.rainxch.githubstore.routes + +import io.ktor.http.* +import io.ktor.server.response.* +import io.ktor.server.routing.* +import kotlinx.serialization.Serializable + +// Old client builds (pre-1.6) wired the device-flow URLs under `/repo/` by +// mistake. Cloudflare analytics show ~110 hits/week to `/v1/repo/login/device` +// and `/v1/repo/login/oauth` returning 404 — they fall through the generic +// `/repo/{owner}/{name}` route into a GitHub lookup that legitimately 404s. +// Replace with a 410 Gone tombstone so old clients get a real signal and a +// pointer to the correct path, and so CF can cache the response and stop +// hammering origin. +// +// Note: declared BEFORE `repoRoutes` in the routing block so the static +// segments win over the parameterized `/repo/{owner}/{name}` route. +@Serializable +private data class DeprecatedAuthNotice( + val error: String = "endpoint_deprecated", + val message: String = "This path was used by pre-1.6 builds. Use POST /v1/auth/device/start and POST /v1/auth/device/poll for the device-flow.", + val deprecated_at: String = "2024-09-01", + val use_instead: List = listOf( + "/v1/auth/device/start", + "/v1/auth/device/poll", + ), +) + +private val NOTICE = DeprecatedAuthNotice() +private const val GONE_CACHE_CONTROL = "public, max-age=86400" + +fun Route.deprecationRoutes() { + listOf("/repo/login/device", "/repo/login/oauth").forEach { path -> + get(path) { + call.response.header(HttpHeaders.CacheControl, GONE_CACHE_CONTROL) + call.respond(HttpStatusCode.Gone, NOTICE) + } + post(path) { + call.response.header(HttpHeaders.CacheControl, GONE_CACHE_CONTROL) + call.respond(HttpStatusCode.Gone, NOTICE) + } + } +} diff --git a/src/main/kotlin/zed/rainxch/githubstore/routes/EventRoutes.kt b/src/main/kotlin/zed/rainxch/githubstore/routes/EventRoutes.kt index 4c128bf..34735f4 100644 --- a/src/main/kotlin/zed/rainxch/githubstore/routes/EventRoutes.kt +++ b/src/main/kotlin/zed/rainxch/githubstore/routes/EventRoutes.kt @@ -1,88 +1,21 @@ package zed.rainxch.githubstore.routes import io.ktor.http.* -import io.ktor.server.request.* import io.ktor.server.response.* import io.ktor.server.routing.* -import zed.rainxch.githubstore.db.EventRepository -import zed.rainxch.githubstore.model.EventRequest -import zed.rainxch.githubstore.requireMaxBody -import zed.rainxch.githubstore.util.ApiError -private const val MAX_BATCH_SIZE = 50 -private const val EVENTS_MAX_BODY = 256L * 1024 - -// Per-field length caps. Events feeds the training pipeline and goes into -// the `events` table forever — without these, a buggy or malicious client -// can push multi-megabyte strings in and bloat the table. -private const val MAX_DEVICE_ID_LEN = 128 -private const val MAX_APP_VERSION_LEN = 32 -private const val MAX_ERROR_CODE_LEN = 128 -private const val MAX_QUERY_HASH_LEN = 64 - -private val VALID_EVENT_TYPES = setOf( - "search_performed", - "search_result_clicked", - "repo_viewed", - "release_downloaded", - "install_started", - "install_succeeded", - "install_failed", - "app_opened_after_install", - "uninstalled", - "favorited", - "unfavorited", -) - -fun Route.eventRoutes(eventRepository: EventRepository) { +// Telemetry was killed in the 2026-04 audit. Endpoint accepts and silently +// discards the batch — returns 204 No Content so pre-1.8.3 clients (which +// treat any non-2xx as failure and retry) stop spamming origin and Sentry. +// +// The data goes nowhere: the Events table and SignalAggregationWorker are +// still wired up for historical rows, but no new ingestion happens here. +// +// Once 1.8.3+ has propagated and TelemetryRepositoryImpl on the client has +// shipped a sticky-disable-on-410 flag, flip this back to `410 Gone` with a +// proper JSON deprecation notice so laggard clients get a real signal. +fun Route.eventRoutes() { post("/events") { - if (!call.requireMaxBody(EVENTS_MAX_BODY)) return@post - - val events = call.receive>() - - if (events.isEmpty()) { - call.respond(HttpStatusCode.BadRequest, ApiError("empty_event_list")) - return@post - } - - if (events.size > MAX_BATCH_SIZE) { - call.respond( - HttpStatusCode.BadRequest, - ApiError("batch_too_large", message = "Max $MAX_BATCH_SIZE events per batch"), - ) - return@post - } - - val invalid = events.filter { it.eventType !in VALID_EVENT_TYPES } - if (invalid.isNotEmpty()) { - call.respond( - HttpStatusCode.BadRequest, - ApiError( - "unknown_event_type", - message = "Unknown event types: ${invalid.map { it.eventType }.distinct()}", - ), - ) - return@post - } - - val oversized = events.firstOrNull { e -> - e.deviceId.length > MAX_DEVICE_ID_LEN || - (e.appVersion?.length ?: 0) > MAX_APP_VERSION_LEN || - (e.errorCode?.length ?: 0) > MAX_ERROR_CODE_LEN || - (e.queryHash?.length ?: 0) > MAX_QUERY_HASH_LEN - } - if (oversized != null) { - call.respond( - HttpStatusCode.BadRequest, - ApiError( - "field_too_long", - message = "Field too long. Limits: deviceId $MAX_DEVICE_ID_LEN, appVersion $MAX_APP_VERSION_LEN, errorCode $MAX_ERROR_CODE_LEN, queryHash $MAX_QUERY_HASH_LEN chars", - ), - ) - return@post - } - - eventRepository.insertBatch(events) call.respond(HttpStatusCode.NoContent) } } diff --git a/src/main/kotlin/zed/rainxch/githubstore/routes/InternalRoutes.kt b/src/main/kotlin/zed/rainxch/githubstore/routes/InternalRoutes.kt index f642dfc..bc1d680 100644 --- a/src/main/kotlin/zed/rainxch/githubstore/routes/InternalRoutes.kt +++ b/src/main/kotlin/zed/rainxch/githubstore/routes/InternalRoutes.kt @@ -5,6 +5,7 @@ import io.ktor.server.auth.* import io.ktor.server.request.* import io.ktor.server.response.* import io.ktor.server.routing.* +import zed.rainxch.githubstore.respondNotFound import kotlinx.coroutines.CoroutineScope import kotlinx.coroutines.Dispatchers import kotlinx.coroutines.SupervisorJob @@ -61,7 +62,7 @@ fun Route.internalRoutes( if (isProduction && adminToken == null) { route("/internal") { get("{...}") { - call.respond(HttpStatusCode.NotFound, mapOf("error" to "Not found")) + respondNotFound(call) } } return @@ -76,7 +77,7 @@ fun Route.internalRoutes( authenticate(ADMIN_BASIC_AUTH, optional = true) { get("/metrics") { if (!authorized(call, adminToken)) { - return@get call.respond(HttpStatusCode.NotFound, mapOf("error" to "Not found")) + return@get respondNotFound(call) } // An authenticated endpoint must never be edge-cached. call.response.header(HttpHeaders.CacheControl, "no-store, private") @@ -118,7 +119,7 @@ fun Route.internalRoutes( // current job finishes. post("/backfill-stale") { if (!authorized(call, adminToken)) { - return@post call.respond(HttpStatusCode.NotFound, mapOf("error" to "Not found")) + return@post respondNotFound(call) } val limit = call.request.queryParameters["limit"] ?.toIntOrNull() diff --git a/src/main/kotlin/zed/rainxch/githubstore/routes/MirrorRoutes.kt b/src/main/kotlin/zed/rainxch/githubstore/routes/MirrorRoutes.kt index 8df0ecf..c4637c3 100644 --- a/src/main/kotlin/zed/rainxch/githubstore/routes/MirrorRoutes.kt +++ b/src/main/kotlin/zed/rainxch/githubstore/routes/MirrorRoutes.kt @@ -32,6 +32,7 @@ fun Route.mirrorRoutes(registry: MirrorStatusRegistry) { // a stale latency from before the mirror went down is misleading. latencyMs = if (snap.status == MirrorStatus.OK || snap.status == MirrorStatus.DEGRADED) snap.latencyMs else null, lastCheckedAt = snap.lastCheckedAt?.toString(), + trafficKinds = preset.trafficKinds, ) } diff --git a/src/main/kotlin/zed/rainxch/githubstore/routes/RootRoutes.kt b/src/main/kotlin/zed/rainxch/githubstore/routes/RootRoutes.kt new file mode 100644 index 0000000..e93fe83 --- /dev/null +++ b/src/main/kotlin/zed/rainxch/githubstore/routes/RootRoutes.kt @@ -0,0 +1,29 @@ +package zed.rainxch.githubstore.routes + +import io.ktor.http.* +import io.ktor.server.response.* +import io.ktor.server.routing.* +import kotlinx.serialization.Serializable + +// Visiting the bare hostname used to 404. Scanners, monitoring tooling, and the +// occasional curious browser hit `/`, so a tiny greeting is cheaper at the edge +// than a 404 — and a clear pointer to the docs avoids confusion about whether +// the service is live. Long edge TTL: this body is byte-stable. +@Serializable +private data class RootGreeting( + val name: String = "github-store-backend", + val docs: String = "https://github-store.org", + val api: String = "https://api.github-store.org/v1/", +) + +private val ROOT_GREETING = RootGreeting() + +fun Route.rootRoutes() { + get("/") { + call.response.header( + HttpHeaders.CacheControl, + "public, max-age=3600, s-maxage=86400", + ) + call.respond(ROOT_GREETING) + } +} diff --git a/src/main/kotlin/zed/rainxch/githubstore/routes/Routing.kt b/src/main/kotlin/zed/rainxch/githubstore/routes/Routing.kt index d90ce45..977494f 100644 --- a/src/main/kotlin/zed/rainxch/githubstore/routes/Routing.kt +++ b/src/main/kotlin/zed/rainxch/githubstore/routes/Routing.kt @@ -5,7 +5,6 @@ import io.ktor.server.plugins.ratelimit.* import io.ktor.server.routing.* import org.koin.ktor.ext.inject import zed.rainxch.githubstore.announcements.AnnouncementsRegistry -import zed.rainxch.githubstore.db.EventRepository import zed.rainxch.githubstore.db.MeilisearchClient import zed.rainxch.githubstore.db.RepoRepository import zed.rainxch.githubstore.db.SearchMissRepository @@ -22,7 +21,6 @@ import zed.rainxch.githubstore.match.SigningFingerprintRepository import zed.rainxch.githubstore.mirrors.MirrorStatusRegistry fun Application.configureRouting() { - val eventRepository by inject() val repoRepository by inject() val searchRepository by inject() val searchMissRepository by inject() @@ -40,13 +38,15 @@ fun Application.configureRouting() { val repoRefreshCoordinator by inject() routing { + rootRoutes() route("/v1") { healthRoutes(meilisearchClient, announcementsRegistry) - rateLimit(RateLimitName("events")) { - eventRoutes(eventRepository) - } + eventRoutes() categoryRoutes(repoRepository) topicRoutes(repoRepository) + // Tombstones for pre-1.6 auth paths under /repo/. Declared before + // repoRoutes so the static segments win over /repo/{owner}/{name}. + deprecationRoutes() repoRoutes(repoRepository, resourceClient) rateLimit(RateLimitName("search")) { searchRoutes(meilisearchClient, searchRepository, githubSearchClient, searchMissRepository, searchMetrics) diff --git a/src/main/kotlin/zed/rainxch/githubstore/routes/SigningSeedsRoutes.kt b/src/main/kotlin/zed/rainxch/githubstore/routes/SigningSeedsRoutes.kt index 7e8127d..366b813 100644 --- a/src/main/kotlin/zed/rainxch/githubstore/routes/SigningSeedsRoutes.kt +++ b/src/main/kotlin/zed/rainxch/githubstore/routes/SigningSeedsRoutes.kt @@ -4,13 +4,29 @@ import io.ktor.http.* import io.ktor.server.request.* import io.ktor.server.response.* import io.ktor.server.routing.* +import kotlinx.serialization.json.Json import zed.rainxch.githubstore.match.SigningFingerprintRepository import zed.rainxch.githubstore.match.SigningSeedsResponse import zed.rainxch.githubstore.util.ApiError +import java.security.MessageDigest private const val DEFAULT_PAGE_SIZE = 1000 private const val MAX_PAGE_SIZE = 5000 +// Cache budget: signing seeds change only on the daily F-Droid sync cron, and +// the operator purges the Cloudflare cache when that lands. Until then the +// content is byte-stable for the same (since, cursor, limit) inputs, so push +// a long edge TTL plus a strong ETag for conditional revalidation. +// +// max-age=86400 — clients hold for 1 day +// s-maxage=604800 — Cloudflare holds for 7 days (purged on seed update) +// stale-while-revalidate=86400 +// — serve stale up to 1 day while the edge re-fetches +private const val SIGNING_SEEDS_CACHE_CONTROL = + "public, max-age=86400, s-maxage=604800, stale-while-revalidate=86400" + +private val EtagJson = Json { encodeDefaults = true } + // Per E1_BACKEND_HANDOFF.md: // GET /v1/signing-seeds?since=&platform=android&cursor= // @@ -55,11 +71,39 @@ fun Route.signingSeedsRoutes(repository: SigningFingerprintRepository) { nextCursor = page.nextCursor?.encode(), ) - // Paginated dump -- clients fetch incrementally with their own `since` - // cursor, so a short edge cache is fine. 5 minutes balances freshness - // (new F-Droid index data lands within minutes of a daily cron run) - // against repeat fetches from the same client during a sync session. - call.response.header(HttpHeaders.CacheControl, "public, max-age=60, s-maxage=300") + val etag = etagOf(response) + call.response.header(HttpHeaders.CacheControl, SIGNING_SEEDS_CACHE_CONTROL) + call.response.header(HttpHeaders.ETag, etag) + + val ifNoneMatch = call.request.headers[HttpHeaders.IfNoneMatch]?.trim() + if (ifNoneMatch != null && etagsMatch(ifNoneMatch, etag)) { + call.respond(HttpStatusCode.NotModified) + return@get + } + call.respond(response) } } + +// Strong ETag over the canonical JSON of the response. Same (since, cursor, +// limit) inputs produce the same rows + nextCursor, so identical bytes -> +// identical tag. New rows arriving for the same `since` produce a fresh tag, +// which is correct: the response body really did change. +private fun etagOf(response: SigningSeedsResponse): String { + val canonical = EtagJson.encodeToString(SigningSeedsResponse.serializer(), response) + val md = MessageDigest.getInstance("SHA-256") + val digest = md.digest(canonical.toByteArray(Charsets.UTF_8)) + val hex = digest.joinToString("") { "%02x".format(it) } + return "\"${hex.take(16)}\"" +} + +// Lenient match: clients (and CDNs) sometimes send the weak prefix `W/` or a +// list of comma-separated tags. Accept either an exact match against ours or +// any token in a comma-separated list. We never emit weak ETags ourselves, so +// strip the `W/` prefix on the incoming side before comparing. +private fun etagsMatch(header: String, ours: String): Boolean { + if (header == "*") return true + return header.split(",") + .map { it.trim().removePrefix("W/") } + .any { it == ours } +} diff --git a/src/test/kotlin/zed/rainxch/githubstore/StatusPagesOverrideTest.kt b/src/test/kotlin/zed/rainxch/githubstore/StatusPagesOverrideTest.kt new file mode 100644 index 0000000..132d927 --- /dev/null +++ b/src/test/kotlin/zed/rainxch/githubstore/StatusPagesOverrideTest.kt @@ -0,0 +1,51 @@ +package zed.rainxch.githubstore + +import io.ktor.client.request.* +import io.ktor.client.statement.* +import io.ktor.http.* +import io.ktor.server.application.* +import io.ktor.server.plugins.statuspages.* +import io.ktor.server.response.* +import io.ktor.server.routing.* +import io.ktor.server.testing.* +import kotlin.test.Test +import kotlin.test.assertEquals + +// Regression pin for Ktor 3's StatusPages behaviour: +// +// `status(HttpStatusCode.NotFound)` in StatusPages OVERRIDES route-level +// `call.respond(HttpStatusCode.NotFound, body)` bodies. Both unmatched-route +// 404s AND explicit route 404s flow through the global handler — there is +// no built-in way to scope the handler to "unmatched routes only". +// +// Consequence: every route that wants a 404 with a custom body must either +// (a) route through the shared `Plugins.respondNotFound` helper so the body +// is consistent, or (b) use a different status code (e.g. 400 BadRequest +// for input-validation cases). The global handler emits ApiError("not_found") +// with a 300s edge-cache header and a privacy-safe `[404 …]` log line. +// +// If a future Ktor upgrade changes this behaviour, this test fails and we +// learn before InternalRoutes/BadgeRoutes/RepoRefreshRoutes start serving +// inconsistent 404 bodies. +class StatusPagesOverrideTest { + + @Test + fun `status NotFound handler overrides explicit route-level 404 bodies`() = testApplication { + application { + install(StatusPages) { + status(HttpStatusCode.NotFound) { call, _ -> + call.respondText("global-handler", status = HttpStatusCode.NotFound) + } + } + routing { + get("/route-with-body") { + call.respondText("route-level-body", status = HttpStatusCode.NotFound) + } + } + } + + val response = client.get("/route-with-body") + assertEquals(HttpStatusCode.NotFound, response.status) + assertEquals("global-handler", response.bodyAsText()) + } +} diff --git a/src/test/kotlin/zed/rainxch/githubstore/match/SigningSeedsRouteTest.kt b/src/test/kotlin/zed/rainxch/githubstore/match/SigningSeedsRouteTest.kt index 2337a60..781e3b6 100644 --- a/src/test/kotlin/zed/rainxch/githubstore/match/SigningSeedsRouteTest.kt +++ b/src/test/kotlin/zed/rainxch/githubstore/match/SigningSeedsRouteTest.kt @@ -12,6 +12,7 @@ import kotlinx.serialization.json.Json import zed.rainxch.githubstore.routes.signingSeedsRoutes import kotlin.test.Test import kotlin.test.assertEquals +import kotlin.test.assertNotNull import kotlin.test.assertNull import kotlin.test.assertTrue @@ -39,6 +40,16 @@ class SigningSeedsRouteTest { } } + // Returns the same rows on every call. Used for cache-validation tests + // where repeated requests must yield byte-identical responses (so the + // ETag stays stable). + private class StaticFakeRepo( + private val rows: List, + ) : SigningFingerprintRepository() { + override suspend fun page(sinceMs: Long?, cursor: PageCursor?, limit: Int): SigningSeedPage = + SigningSeedPage(rows = rows, nextCursor = null) + } + private fun ApplicationTestBuilder.installPlugins() { application { install(ContentNegotiation) { json(Json { ignoreUnknownKeys = true }) } @@ -147,4 +158,111 @@ class SigningSeedsRouteTest { client.get("/v1/signing-seeds?platform=android&limit=0") assert(repo.lastLimit >= 1) { "limit not clamped to >=1: ${repo.lastLimit}" } } + + @Test + fun `response carries long-lived Cache-Control and ETag headers`() = testApplication { + val repo = FakeRepo(pages = listOf(listOf( + SigningSeedRow("AB:CD", "octocat", "hello-world", 100L), + ))) + installPlugins() + application { routing { route("/v1") { signingSeedsRoutes(repo) } } } + + val response = client.get("/v1/signing-seeds?platform=android") + assertEquals(HttpStatusCode.OK, response.status) + assertEquals( + "public, max-age=86400, s-maxage=604800, stale-while-revalidate=86400", + response.headers[HttpHeaders.CacheControl], + ) + val etag = response.headers[HttpHeaders.ETag] + assertNotNull(etag, "ETag header missing") + assertTrue(etag.startsWith("\"") && etag.endsWith("\""), "ETag must be a quoted string: $etag") + } + + @Test + fun `matching If-None-Match returns 304 Not Modified`() = testApplication { + val repo = StaticFakeRepo(listOf( + SigningSeedRow("AB:CD", "octocat", "hello-world", 100L), + )) + installPlugins() + application { routing { route("/v1") { signingSeedsRoutes(repo) } } } + + val first = client.get("/v1/signing-seeds?platform=android") + val etag = first.headers[HttpHeaders.ETag] + assertNotNull(etag) + + val second = client.get("/v1/signing-seeds?platform=android") { + header(HttpHeaders.IfNoneMatch, etag) + } + assertEquals(HttpStatusCode.NotModified, second.status) + assertEquals(etag, second.headers[HttpHeaders.ETag]) + assertEquals( + "public, max-age=86400, s-maxage=604800, stale-while-revalidate=86400", + second.headers[HttpHeaders.CacheControl], + ) + } + + @Test + fun `wildcard If-None-Match returns 304`() = testApplication { + val repo = FakeRepo(pages = listOf(listOf( + SigningSeedRow("AB:CD", "octocat", "hello-world", 100L), + ))) + installPlugins() + application { routing { route("/v1") { signingSeedsRoutes(repo) } } } + + val response = client.get("/v1/signing-seeds?platform=android") { + header(HttpHeaders.IfNoneMatch, "*") + } + assertEquals(HttpStatusCode.NotModified, response.status) + } + + @Test + fun `weak-prefix If-None-Match still matches our strong tag`() = testApplication { + val repo = StaticFakeRepo(listOf( + SigningSeedRow("AB:CD", "octocat", "hello-world", 100L), + )) + installPlugins() + application { routing { route("/v1") { signingSeedsRoutes(repo) } } } + + val first = client.get("/v1/signing-seeds?platform=android") + val etag = first.headers[HttpHeaders.ETag] + assertNotNull(etag) + + val second = client.get("/v1/signing-seeds?platform=android") { + header(HttpHeaders.IfNoneMatch, "W/$etag") + } + assertEquals(HttpStatusCode.NotModified, second.status) + } + + @Test + fun `different page contents produce different ETags`() = testApplication { + val repo = FakeRepo(pages = listOf( + listOf(SigningSeedRow("AB:CD", "octocat", "hello-world", 100L)), + listOf(SigningSeedRow("EF:01", "rsms", "inter", 200L)), + )) + installPlugins() + application { routing { route("/v1") { signingSeedsRoutes(repo) } } } + + val first = client.get("/v1/signing-seeds?platform=android") + val second = client.get("/v1/signing-seeds?platform=android") + val firstTag = first.headers[HttpHeaders.ETag] + val secondTag = second.headers[HttpHeaders.ETag] + assertNotNull(firstTag) + assertNotNull(secondTag) + assert(firstTag != secondTag) { "ETag should change when rows change: $firstTag" } + } + + @Test + fun `non-matching If-None-Match still returns 200 with body`() = testApplication { + val repo = FakeRepo(pages = listOf(listOf( + SigningSeedRow("AB:CD", "octocat", "hello-world", 100L), + ))) + installPlugins() + application { routing { route("/v1") { signingSeedsRoutes(repo) } } } + + val response = client.get("/v1/signing-seeds?platform=android") { + header(HttpHeaders.IfNoneMatch, "\"deadbeefdeadbeef\"") + } + assertEquals(HttpStatusCode.OK, response.status) + assertTrue(response.bodyAsText().contains("\"fingerprint\":\"AB:CD\"")) + } } diff --git a/src/test/kotlin/zed/rainxch/githubstore/mirrors/MirrorRoutesTest.kt b/src/test/kotlin/zed/rainxch/githubstore/mirrors/MirrorRoutesTest.kt index 23a54ac..61adad8 100644 --- a/src/test/kotlin/zed/rainxch/githubstore/mirrors/MirrorRoutesTest.kt +++ b/src/test/kotlin/zed/rainxch/githubstore/mirrors/MirrorRoutesTest.kt @@ -97,20 +97,77 @@ class MirrorRoutesTest { } @Test - fun `direct mirror has null url_template, community ones expose their template`() = testApplication { + fun `direct mirror has null url_template, community ones expose an https template with a placeholder`() = testApplication { setupApp(MirrorStatusRegistry()) val body = Json.parseToJsonElement(client.get("/v1/mirrors/list").bodyAsText()).jsonObject val byId = body["mirrors"]!!.jsonArray.associateBy { it.jsonObject["id"]!!.jsonPrimitive.content } assertEquals("null", byId["direct"]!!.jsonObject["url_template"]!!.toString()) - // Every non-direct entry must have a non-null template ending in {url}. + // Every non-direct entry must have a non-null https template containing + // at least one placeholder. Whole-URL proxies use `/{url}`; specialised + // mirrors (e.g. jsDelivr) use a multi-placeholder template such as + // `/{owner}/{repo}@{ref}/{path}` — clients dispatch by placeholder set + // and `traffic_kinds`. + val placeholder = Regex("\\{[a-z]+\\}") byId.filterKeys { it != "direct" }.forEach { (id, entry) -> val tpl = entry.jsonObject["url_template"]!!.jsonPrimitive.content - assertTrue(tpl.endsWith("/{url}"), "$id template must end with /{url}: $tpl") assertTrue(tpl.startsWith("https://"), "$id template must be https://: $tpl") + assertTrue(placeholder.containsMatchIn(tpl), "$id template must include a placeholder: $tpl") } } + @Test + fun `every mirror exposes traffic_kinds and whole-url proxies cover both kinds`() = testApplication { + setupApp(MirrorStatusRegistry()) + val body = Json.parseToJsonElement(client.get("/v1/mirrors/list").bodyAsText()).jsonObject + val byId = body["mirrors"]!!.jsonArray.associateBy { it.jsonObject["id"]!!.jsonPrimitive.content } + + byId.forEach { (id, entry) -> + val kinds = entry.jsonObject["traffic_kinds"]?.jsonArray + ?.map { it.jsonPrimitive.content } + ?: error("$id missing traffic_kinds") + assertTrue(kinds.isNotEmpty(), "$id traffic_kinds must be non-empty") + + val tpl = entry.jsonObject["url_template"]?.takeIf { it.toString() != "null" } + ?.jsonPrimitive?.content + if (tpl == null || tpl.endsWith("/{url}")) { + // Direct + whole-URL proxies handle every github.com URL. + assertTrue("release_asset" in kinds, "$id must list release_asset: $kinds") + assertTrue("raw_file" in kinds, "$id must list raw_file: $kinds") + } + } + } + + @Test + fun `fastly_jsdelivr mirror is included, raw-file-only, with jsdelivr path template`() = testApplication { + setupApp(MirrorStatusRegistry()) + val body = Json.parseToJsonElement(client.get("/v1/mirrors/list").bodyAsText()).jsonObject + val byId = body["mirrors"]!!.jsonArray.associateBy { it.jsonObject["id"]!!.jsonPrimitive.content } + + val entry = byId["fastly_jsdelivr"] + assertNotNull(entry, "fastly_jsdelivr mirror missing from /v1/mirrors/list") + val tpl = entry.jsonObject["url_template"]!!.jsonPrimitive.content + assertEquals( + "https://fastly.jsdelivr.net/gh/{owner}/{repo}@{ref}/{path}", + tpl, + ) + val kinds = entry.jsonObject["traffic_kinds"]!!.jsonArray + .map { it.jsonPrimitive.content } + assertEquals(listOf("raw_file"), kinds, "jsdelivr must be raw_file-only — release-asset URLs are not under /gh/") + } + + @Test + fun `fastgit_cc is intentionally absent — lineage unverified`() = testApplication { + // CodeRabbit flagged fastgit.cc as a trust concern: no public artifact + // ties the .cc TLD to the FastGitORG team that ran the (now-defunct) + // fastgit.org. Shipping it would be a supply-chain risk. Re-add only + // after operator sign-off. + setupApp(MirrorStatusRegistry()) + val body = Json.parseToJsonElement(client.get("/v1/mirrors/list").bodyAsText()).jsonObject + val ids = body["mirrors"]!!.jsonArray.map { it.jsonObject["id"]!!.jsonPrimitive.content } + assertTrue("fastgit_cc" !in ids, "fastgit_cc shipped without lineage verification: $ids") + } + @Test fun `response shape includes generated_at`() = testApplication { setupApp(MirrorStatusRegistry()) diff --git a/src/test/kotlin/zed/rainxch/githubstore/routes/AuthPollDiagnosticsTest.kt b/src/test/kotlin/zed/rainxch/githubstore/routes/AuthPollDiagnosticsTest.kt new file mode 100644 index 0000000..bdf121d --- /dev/null +++ b/src/test/kotlin/zed/rainxch/githubstore/routes/AuthPollDiagnosticsTest.kt @@ -0,0 +1,200 @@ +package zed.rainxch.githubstore.routes + +import ch.qos.logback.classic.Logger +import ch.qos.logback.classic.spi.ILoggingEvent +import ch.qos.logback.core.read.ListAppender +import io.ktor.client.request.* +import io.ktor.client.statement.* +import io.ktor.http.* +import io.ktor.serialization.kotlinx.json.* +import io.ktor.server.application.* +import io.ktor.server.plugins.contentnegotiation.* +import io.ktor.server.plugins.ratelimit.* +import io.ktor.server.routing.* +import io.ktor.server.testing.* +import kotlinx.serialization.json.Json +import org.slf4j.LoggerFactory +import zed.rainxch.githubstore.ingest.GitHubDeviceClient +import zed.rainxch.githubstore.ingest.GitHubDeviceResponse +import kotlin.test.AfterTest +import kotlin.test.BeforeTest +import kotlin.test.Test +import kotlin.test.assertEquals +import kotlin.test.assertFalse +import kotlin.test.assertNotNull +import kotlin.test.assertTrue + +class AuthPollDiagnosticsTest { + + private class FakeDeviceClient( + private val response: GitHubDeviceResponse, + ) : GitHubDeviceClient(clientId = "test-client-id") { + var lastDeviceCode: String? = null + + override suspend fun pollDeviceToken(deviceCode: String): GitHubDeviceResponse { + lastDeviceCode = deviceCode + return response + } + + override suspend fun startDeviceFlow(): GitHubDeviceResponse = + error("startDeviceFlow not used in these tests") + } + + private lateinit var appender: ListAppender + private lateinit var logger: Logger + + @BeforeTest + fun attachAppender() { + logger = LoggerFactory.getLogger("AuthRoutes") as Logger + appender = ListAppender().apply { start() } + logger.addAppender(appender) + } + + @AfterTest + fun detachAppender() { + logger.detachAppender(appender) + appender.stop() + } + + private fun ApplicationTestBuilder.setupApp(client: GitHubDeviceClient) { + application { + install(ContentNegotiation) { json(Json { ignoreUnknownKeys = true }) } + install(RateLimit) { + // Tests focus on the route's logging + status-mapping behaviour; + // the production buckets (10/hr/IP start, 200/hr/IP poll) would + // be flaky here and slow each test by ~minutes. Register the + // same bucket names with an unbounded ceiling so authRoutes + // wires up without changing under test. + register(RateLimitName("auth-start")) { + rateLimiter(limit = Int.MAX_VALUE, refillPeriod = kotlin.time.Duration.parse("1m")) + } + register(RateLimitName("auth-poll")) { + rateLimiter(limit = Int.MAX_VALUE, refillPeriod = kotlin.time.Duration.parse("1m")) + } + } + routing { route("/v1") { authRoutes(client) } } + } + } + + private fun formBody(deviceCode: String?): String = + if (deviceCode == null) "" else "device_code=$deviceCode" + + @Test + fun `pending poll forwards 200 verbatim and logs authorization_pending error code`() = testApplication { + val fake = FakeDeviceClient( + GitHubDeviceResponse( + status = HttpStatusCode.OK, + body = """{"error":"authorization_pending"}""", + ), + ) + setupApp(fake) + + val response = client.post("/v1/auth/device/poll") { + header(HttpHeaders.ContentType, ContentType.Application.FormUrlEncoded.toString()) + setBody(formBody("dc_secret_value")) + } + assertEquals(HttpStatusCode.OK, response.status) + assertTrue(response.bodyAsText().contains("authorization_pending")) + + val line = pollLogLine() + assertTrue(line.contains("gh_err=authorization_pending"), line) + assertTrue(line.contains("ghs=200"), line) + assertTrue(line.contains("lat_ms="), line) + assertTrue(line.contains("dch="), line) + } + + @Test + fun `successful poll never logs the raw device_code or any token body field`() = testApplication { + val accessToken = "gho_test_token_must_not_leak" + val fake = FakeDeviceClient( + GitHubDeviceResponse( + status = HttpStatusCode.OK, + body = """{"access_token":"$accessToken","token_type":"bearer","scope":""}""", + ), + ) + setupApp(fake) + + val rawDeviceCode = "dc_must_not_be_logged" + val response = client.post("/v1/auth/device/poll") { + header(HttpHeaders.ContentType, ContentType.Application.FormUrlEncoded.toString()) + setBody(formBody(rawDeviceCode)) + } + assertEquals(HttpStatusCode.OK, response.status) + // Body forwarded verbatim — that's the contract with the client. + assertTrue(response.bodyAsText().contains(accessToken)) + + val line = pollLogLine() + assertFalse(line.contains(accessToken), "access_token leaked into log: $line") + assertFalse(line.contains(rawDeviceCode), "raw device_code leaked into log: $line") + assertTrue(line.contains("gh_err=-"), line) + assertTrue(line.contains("ghs=200"), line) + } + + @Test + fun `non-2xx upstream flips client response to 502 and logs upstream status`() = testApplication { + val fake = FakeDeviceClient( + GitHubDeviceResponse( + status = HttpStatusCode.BadRequest, + body = """{"error":"invalid_request"}""", + ), + ) + setupApp(fake) + + val response = client.post("/v1/auth/device/poll") { + header(HttpHeaders.ContentType, ContentType.Application.FormUrlEncoded.toString()) + setBody(formBody("dc_x")) + } + assertEquals(HttpStatusCode.BadGateway, response.status) + + val line = pollLogLine() + assertTrue(line.contains("ghs=400"), line) + assertTrue(line.contains("gh_err=invalid_request"), line) + } + + @Test + fun `missing device_code returns 400 and emits no auth-poll log line`() = testApplication { + val fake = FakeDeviceClient( + GitHubDeviceResponse(HttpStatusCode.OK, """{"error":"authorization_pending"}"""), + ) + setupApp(fake) + + val response = client.post("/v1/auth/device/poll") { + header(HttpHeaders.ContentType, ContentType.Application.FormUrlEncoded.toString()) + setBody("") + } + assertEquals(HttpStatusCode.BadRequest, response.status) + // No upstream call, no auth-poll log entry. + assertEquals(0, appender.list.count { it.formattedMessage.contains("[auth-poll") }) + } + + @Test + fun `same device_code always hashes to the same dch prefix`() = testApplication { + val fake = FakeDeviceClient( + GitHubDeviceResponse(HttpStatusCode.OK, """{"error":"authorization_pending"}"""), + ) + setupApp(fake) + + client.post("/v1/auth/device/poll") { + header(HttpHeaders.ContentType, ContentType.Application.FormUrlEncoded.toString()) + setBody(formBody("repeatable_code")) + } + client.post("/v1/auth/device/poll") { + header(HttpHeaders.ContentType, ContentType.Application.FormUrlEncoded.toString()) + setBody(formBody("repeatable_code")) + } + + val dchValues = appender.list + .map { it.formattedMessage } + .filter { it.contains("[auth-poll") } + .map { Regex("dch=([0-9a-f]+)").find(it)?.groupValues?.get(1) } + assertEquals(2, dchValues.size) + assertNotNull(dchValues[0]) + assertEquals(dchValues[0], dchValues[1]) + } + + private fun pollLogLine(): String { + val msgs = appender.list.map { it.formattedMessage }.filter { it.contains("[auth-poll") } + assertEquals(1, msgs.size, "expected exactly one auth-poll log line, got: $msgs") + return msgs.single() + } +} diff --git a/src/test/kotlin/zed/rainxch/githubstore/routes/DeprecationRoutesTest.kt b/src/test/kotlin/zed/rainxch/githubstore/routes/DeprecationRoutesTest.kt new file mode 100644 index 0000000..944bdcb --- /dev/null +++ b/src/test/kotlin/zed/rainxch/githubstore/routes/DeprecationRoutesTest.kt @@ -0,0 +1,63 @@ +package zed.rainxch.githubstore.routes + +import io.ktor.client.request.* +import io.ktor.client.statement.* +import io.ktor.http.* +import io.ktor.serialization.kotlinx.json.* +import io.ktor.server.application.* +import io.ktor.server.plugins.contentnegotiation.* +import io.ktor.server.routing.* +import io.ktor.server.testing.* +import kotlinx.serialization.json.Json +import kotlin.test.Test +import kotlin.test.assertEquals +import kotlin.test.assertTrue + +class DeprecationRoutesTest { + + private fun ApplicationTestBuilder.installPlugins() { + application { + install(ContentNegotiation) { + json(Json { ignoreUnknownKeys = true; encodeDefaults = true }) + } + } + } + + @Test + fun `GET legacy device path returns 410 Gone with hint`() = testApplication { + installPlugins() + application { routing { route("/v1") { deprecationRoutes() } } } + + val response = client.get("/v1/repo/login/device") + assertEquals(HttpStatusCode.Gone, response.status) + val body = response.bodyAsText() + assertTrue(body.contains("\"error\":\"endpoint_deprecated\""), body) + assertTrue(body.contains("/v1/auth/device/start"), body) + assertTrue(body.contains("/v1/auth/device/poll"), body) + assertEquals( + "public, max-age=86400", + response.headers[HttpHeaders.CacheControl], + ) + } + + @Test + fun `POST legacy oauth path also returns 410 Gone`() = testApplication { + installPlugins() + application { routing { route("/v1") { deprecationRoutes() } } } + + val response = client.post("/v1/repo/login/oauth") { + header(HttpHeaders.ContentType, ContentType.Application.Json.toString()) + setBody("{}") + } + assertEquals(HttpStatusCode.Gone, response.status) + } + + @Test + fun `GET legacy oauth path returns 410 Gone`() = testApplication { + installPlugins() + application { routing { route("/v1") { deprecationRoutes() } } } + + val response = client.get("/v1/repo/login/oauth") + assertEquals(HttpStatusCode.Gone, response.status) + } +} diff --git a/src/test/kotlin/zed/rainxch/githubstore/routes/EventRoutesTest.kt b/src/test/kotlin/zed/rainxch/githubstore/routes/EventRoutesTest.kt new file mode 100644 index 0000000..30350b2 --- /dev/null +++ b/src/test/kotlin/zed/rainxch/githubstore/routes/EventRoutesTest.kt @@ -0,0 +1,50 @@ +package zed.rainxch.githubstore.routes + +import io.ktor.client.request.* +import io.ktor.client.statement.* +import io.ktor.http.* +import io.ktor.serialization.kotlinx.json.* +import io.ktor.server.application.* +import io.ktor.server.plugins.contentnegotiation.* +import io.ktor.server.routing.* +import io.ktor.server.testing.* +import kotlinx.serialization.json.Json +import kotlin.test.Test +import kotlin.test.assertEquals +import kotlin.test.assertTrue + +class EventRoutesTest { + + private fun ApplicationTestBuilder.installPlugins() { + application { + install(ContentNegotiation) { json(Json { ignoreUnknownKeys = true }) } + } + } + + @Test + fun `POST events returns 204 No Content so pre-1_8_3 clients see success and stop retrying`() = testApplication { + installPlugins() + application { routing { route("/v1") { eventRoutes() } } } + + val response = client.post("/v1/events") { + header(HttpHeaders.ContentType, ContentType.Application.Json.toString()) + setBody("[]") + } + + assertEquals(HttpStatusCode.NoContent, response.status) + assertTrue(response.bodyAsText().isEmpty(), "204 must have an empty body") + } + + @Test + fun `body content is ignored - even invalid payloads still get 204`() = testApplication { + installPlugins() + application { routing { route("/v1") { eventRoutes() } } } + + val response = client.post("/v1/events") { + header(HttpHeaders.ContentType, ContentType.Application.Json.toString()) + setBody("not-json-at-all") + } + + assertEquals(HttpStatusCode.NoContent, response.status) + } +} diff --git a/src/test/kotlin/zed/rainxch/githubstore/routes/RootRoutesTest.kt b/src/test/kotlin/zed/rainxch/githubstore/routes/RootRoutesTest.kt new file mode 100644 index 0000000..28fa2ea --- /dev/null +++ b/src/test/kotlin/zed/rainxch/githubstore/routes/RootRoutesTest.kt @@ -0,0 +1,50 @@ +package zed.rainxch.githubstore.routes + +import io.ktor.client.request.* +import io.ktor.client.statement.* +import io.ktor.http.* +import io.ktor.serialization.kotlinx.json.* +import io.ktor.server.application.* +import io.ktor.server.plugins.contentnegotiation.* +import io.ktor.server.routing.* +import io.ktor.server.testing.* +import kotlinx.serialization.json.Json +import kotlin.test.Test +import kotlin.test.assertEquals +import kotlin.test.assertTrue + +class RootRoutesTest { + + private fun ApplicationTestBuilder.installPlugins() { + application { + install(ContentNegotiation) { + json(Json { ignoreUnknownKeys = true; encodeDefaults = true }) + } + } + } + + @Test + fun `GET root returns greeting JSON`() = testApplication { + installPlugins() + application { routing { rootRoutes() } } + + val response = client.get("/") + assertEquals(HttpStatusCode.OK, response.status) + val body = response.bodyAsText() + assertTrue(body.contains("\"name\":\"github-store-backend\""), body) + assertTrue(body.contains("\"docs\":\"https://github-store.org\""), body) + assertTrue(body.contains("\"api\":\"https://api.github-store.org/v1/\""), body) + } + + @Test + fun `GET root sets a long Cache-Control so CF holds the greeting`() = testApplication { + installPlugins() + application { routing { rootRoutes() } } + + val response = client.get("/") + assertEquals( + "public, max-age=3600, s-maxage=86400", + response.headers[HttpHeaders.CacheControl], + ) + } +}