Implement Edge Cookie identity system with KV graph and integration tests#621
Implement Edge Cookie identity system with KV graph and integration tests#621ChristianPavilonis wants to merge 68 commits intomainfrom
Conversation
de0525d to
0940c34
Compare
aram356
left a comment
There was a problem hiding this comment.
Summary
Comprehensive EC identity subsystem: HMAC-based ID generation, KV identity graph with CAS concurrency, partner sync (pixel, batch, pull), identify endpoint, and auction bidstream decoration. The architecture is well-designed — clean separation of concerns, strong input validation, and solid concurrency model. A few cleartext logging issues and a docs inconsistency need addressing before merge.
Blocking
🔧 wrench
- Cleartext EC ID logging: Full EC IDs are logged at
warn!/error!/trace!levels in 5 locations acrossec/mod.rs(lines 128, 211, 296),sync_pixel.rs(line 91), andpull_sync.rs(line 91). Thelog_id()truncation helper was introduced in this PR for exactly this purpose but is not used consistently. See inline comments for fixes. - Stale env var in docs:
TRUSTED_SERVER__EDGE_COOKIE__SECRET_KEYstill appears inconfiguration.md(lines 40, 956) anderror-reference.md(line 154). The PR renamed the TOML section from[edge_cookie]to[ec]and updated some env var references but missed these. Users following the docs will set a variable that is silently ignored, resulting in a startup failure from the placeholder passphrase. Should beTRUSTED_SERVER__EC__PASSPHRASE.
❓ question
HttpOnlyomitted from EC cookie (ec/cookies.rs:11): Intentionally omitted for a hypothetical future JS use case. Is there a concrete plan? The identify endpoint already exposes the EC ID. If not,HttpOnlywould be cheap XSS defense-in-depth.
Non-blocking
🤔 thinking
- Uppercase EC ID rejection in batch sync (
batch_sync.rs:209):is_valid_ec_idrejects uppercase hex, but batch sync validates before normalizing (line 225). Partners submitting uppercase EC IDs getinvalid_ec_idinstead of normalization.
♻️ refactor
_pull_enabledindex lacks CAS (partner.rs:564): Read-modify-write without generation markers. Concurrent partner registrations can overwrite each other's index updates. Self-healing via fallback, but inconsistent with the CAS discipline used elsewhere.
🌱 seedling
- Integration tests don't verify identify response shape:
FullLifecycleandConcurrentPartnerSyncsonly assertuids.<partner>. Theec,consent,degraded,eids, andcluster_sizefields from the API reference are never checked. - Pixel sync failure paths untested end-to-end:
ts_reason=no_ec,no_consent, domain mismatch redirects are unit-tested but not covered in integration tests.
📝 note
trusted-server.tomlships banned placeholder:passphrase = "trusted-server"is rejected byreject_placeholder_secrets. Works in CI via env override, but new contributors will hit a confusing startup error. A TOML comment would help.
⛏ nitpick
Eid/UidmissingDeserialize:openrtb.rstypes deriveSerializeonly. If the auction ever needs to parse EIDs from provider responses,Deserializewill be needed.
CI Status
- cargo fmt: PASS
- clippy: PASS
- cargo test: PASS
- vitest: PASS
- integration tests: FAIL
- CodeQL: FAIL (likely related to cleartext logging findings above)
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
Comprehensive EC identity subsystem: HMAC-based ID generation, KV identity graph with CAS concurrency, partner sync (pixel, batch, pull), identify endpoint, and auction bidstream decoration. The architecture is well-designed with clean separation of concerns, strong concurrency discipline, and solid security choices throughout. Four new blocking findings join the existing ones — three are input-validation gaps and one is a PII leak in the pull sync URL builder that should be resolved before merge.
Blocking
🔧 wrench
- PII leak: client IP forwarded to partners —
pull_sync.rs:273:build_pull_request_urlappends the raw user IP as theipquery parameter on every outbound pull-sync request, exposing it in partner access logs. Contradicts the privacy-preserving design intent and is likely a GDPR Article 5 concern. Remove theipparameter or gate it behind an explicit per-partnerallow_ip_forwardingconfig flag. - Pull sync response body unbounded —
pull_sync.rs:325:take_body_bytes()with no size cap before JSON parsing. A malicious partner can exhaust WASM memory. Batch sync and admin endpoints both haveMAX_BODY_SIZEguards; this path needs one too (64 KiB is generous for a{"uid":"..."}response). - Whitespace-only UIDs bypass validation —
batch_sync.rs:217andsync_pixel.rs:41:is_empty()passes" "and"\t", which get stored as garbage EID values in the KV graph. Replace withtrim().is_empty()at both sites. rand::thread_rng()WASM compatibility unverified —generation.rs:52: requiresgetrandomwith thewasifeature onwasm32-wasip1. Nativecargo testpassing does not prove the WASM build is safe; integration tests are already failing so this may not have been caught. Switch toOsRngor addgetrandom = { version = "0.2", features = ["wasi"] }explicitly.
Non-blocking
🤔 thinking
process_mappingsUpsertResult branches untested —batch_sync.rs:232:NotFound,ConsentWithdrawn, andStalehave zero unit test coverage.Staleis documented as "counted as accepted" with no regression test.- Shared error messages make pull sync validation tests non-isolating —
partner.rs:152,165: both missing-URL and missing-token conditions return the identical error string, so the tests asserting on substrings pass even if the wrong branch fires. Use distinct messages per condition.
♻️ refactor
ec_consent_grantedhas no tests —consent.rs:20: the entry-point gate for all EC creation has no#[cfg(test)]section. Add smoke tests for granted and denied paths.- KV tombstone path never exercised in finalize tests —
finalize.rs:149: all finalize tests passkv: None, so the tombstone write and the cookie-EC ≠ active-EC case inwithdrawal_ec_idsare untested. - Missing HMAC prefix stability test —
generation.rs:228: no test asserts that the same IP + passphrase always produces the same 64-char hash prefix, which is the core deduplication guarantee for the KV graph. normalize_ec_idduplicated in integration tests —integration-tests/tests/common/ec.rs:374: reimplementsnormalize_ec_id_for_kvfrom core. Re-export and use the canonical implementation.
⛏ nitpick
- Use
saturating_subfor consistency —kv.rs:605: the subtraction is safe due to the guard above but inconsistent withsaturating_subused throughout the rest of the module. log_idshould encapsulate the…suffix —mod.rs:51: every call site manually appends"…"in the format string; move it inside the function.
Praises 👍
- CAS-based optimistic concurrency (
kv.rs): bounded retries, gracefulItemPreconditionFailedhandling, andMAX_CAS_RETRIESpreventing infinite loops — textbook correct for a single-writer KV model. - Constant-time API key comparison (
partner.rs):subtle::ConstantTimeEqfor timing attack prevention on key lookups. Many implementations miss this entirely. KvMetadatafast-path consent check (kv_types.rs): mirroringok/country/cluster_sizein metadata to avoid streaming the full KV body is an excellent performance optimisation with the right constraint test.evaluate_known_browserwithOnceLock-cached hash table (device.rs): pre-hashing fingerprints once and caching viaOnceLockis the right WASM-compatible lazy-init pattern.- HMAC stability explicitly documented (
generation.rs:30-33): noting that the output format is a "stable contract" that would invalidate all existing identities if changed is exactly the kind of correctness annotation that prevents future breakage.
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test: PASS
- vitest: PASS
- integration tests: FAIL
- CodeQL: FAIL (cleartext logging — covered in prior review)
aram356
left a comment
There was a problem hiding this comment.
Summary
This PR lands a large, well-partitioned Edge Cookie identity subsystem, and the prior blocking feedback (PII logging via log_id(), whitespace UID checks, pull-sync response size cap, dropping raw client IP from pull-sync query strings, constant-time auth comparisons) has been materially addressed. Blocking issues remaining are: a CI-red integration test (deterministic, root cause in the test client, not the server), an unverified rand::thread_rng() on wasm32-wasip1 that CI cannot catch because no wasm build runs, two new CodeQL alerts on ec/kv.rs riding the pre-existing settings-taint false-positive flow, one untested backend-state machine in batch_sync, and two questions on the auction / partner surfaces.
Blocking
🔧 wrench
- CI red —
test_ec_lifecycle_fastlydeterministic failure: the test client omitsSec-Fetch-Dest: documentandAccept: text/html, sois_navigation_request()correctly returnsfalseand EC never generates (crates/integration-tests/tests/common/ec.rs:93). rand::thread_rng()onwasm32-wasip1unverified; CI builds no wasm target (crates/trusted-server-core/src/ec/generation.rs:52). Add a wasm build to CI and/or switch to an explicitgetrandomwith thewasifeature.- Two new CodeQL alerts on
ec/kv.rs:637and:789ride the samesettings.reject_placeholder_secrets()taint flow that already produces noise onmain. Both are false positives but block the CodeQL gate. Fix with a scoped suppression or a repo-level CodeQL config filter onrust/cleartext-loggingfor settings-derived values. UpsertResult::{NotFound, ConsentWithdrawn, Stale}untested inbatch_sync::process_mappings(crates/trusted-server-core/src/ec/batch_sync.rs:231-244). Prior reviewer flagged this; still unresolved.
❓ question
- Auction
user.id = ""on the no-consent path is emitted into the OpenRTB bid request JSON (crates/trusted-server-core/src/auction/endpoints.rs:60-77,formats.rs:186). Has this been validated against the live Prebid Server target? If not, changeUserInfo.idtoOption<String>withskip_serializing_if. update_pull_enabled_indexrace self-heal path untested (crates/trusted-server-core/src/ec/partner.rs:552-604). The documented fallback atpartner.rs:350is the only thing keeping the index correct under concurrent upserts — please add a concurrency test or file a tracked follow-up.
Non-blocking
🤔 thinking
MAX_CAS_RETRIES = 3, no backoff — likely to starve under contention on hot prefixes; consider exponential backoff + jitter (crates/trusted-server-core/src/ec/kv.rs:300-339).HashMap<String, KvPartnerId>non-deterministic serialization — breaks future hash-based dedup and byte-diffing of stored values; useBTreeMaporIndexMap(crates/trusted-server-core/src/ec/kv_types.rs:59).seen_domainsdrop-newest eviction freezes long-lived ECs on their first 50 domains, defeating thelasttimestamp; switch to LRU or document (crates/trusted-server-core/src/ec/kv.rs:627-642).
♻️ refactor
- EID gating failures log at
debug!— bump towarn!so ad-stack anomalies surface in production (crates/trusted-server-core/src/auction/endpoints.rs:80). ec_consent_granted()is a 1-line pass-through — inline or document the sealing-point intent (crates/trusted-server-core/src/ec/consent.rs:20-22).- Unvalidated domain strings in EC graph writes — add a 255-byte length cap and hostname-shape check at the write boundary (
crates/trusted-server-core/src/ec/kv.rs update_last_seen,kv_types.rs).
🌱 seedling
- No integration test for the GPC / consent-denied identify path. Once the
Sec-Fetch-Destfix lands, addec_full_lifecycle_with_gpcsendingSec-GPC: 1and asserting/_ts/api/v1/identifyreturns 403 / no EID payload (crates/integration-tests/tests/frameworks/scenarios.rs:501-565). - No wasm-target build in CI — root cause of how the
rand::thread_rng()concern reached this point unnoticed. Addingcargo build -p trusted-server-adapter-fastly --target wasm32-wasip1would catch an entire class of deps-pin regressions.
📌 out of scope
- Pull-sync EC ID still travels as a URL query parameter — acknowledged in the PR description as deferred. Please file the follow-up issue and reference it from the commit that ships this PR so it is not lost (
crates/trusted-server-core/src/ec/pull_sync.rs:260-263).
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test (native): PASS
- vitest: PASS
- integration tests: FAIL —
test_ec_lifecycle_fastly(see blocking #1) - CodeQL: FAIL — 15 high alerts; 13 are pre-existing
settings.reject_placeholder_secrets()taint-flow false positives already present onmain, 2 are new onec/kv.rsriding the same flow (see blocking #3)
7009838 to
43df212
Compare
aram356
left a comment
There was a problem hiding this comment.
Summary
Large PR (98 files, +11,974 / −3,099). Architecture and consent plumbing are solid; this review focuses on concrete defects and hardening gaps verified against 109c929c. Three CI checks currently fail (integration tests, format-typescript, CodeQL); two of those are trivially fixable and inline-commented below. The CodeQL failure appears to be a false positive that needs a team decision.
Blocking
🔧 wrench
- Integration tests fail to compile:
crates/integration-tests/Cargo.tomlmissingtrusted-server-coredev-dependency whiletests/common/ec.rs:19andtests/frameworks/scenarios.rs:5import it. (inline) - format-typescript CI failure:
crates/js/lib/src/integrations/prebid/index.ts:413fails prettier — one-line fix vianpm run format. (inline) - Unbounded cookie payloads:
ec/prebid_eids.rs:43(ts-eids) andec/prebid_eids.rs:111(sharedId) decode/clone attacker-controlled cookie values with no size ceiling. (inline)
❓ question
- Consent-withdrawal split-brain at
ec/finalize.rs:56: consent-store delete failures are logged-and-continued while KV tombstones still write. Intentional (KV = source of truth) or should withdrawal fail-closed? (inline) - CodeQL cross-cutting: 15 high-severity
rust/cleartext-loggingalerts all namesettings.reject_placeholder_secrets()as the tainted source, pinned to log-call lines that do not actually reference that method (e.g.auction/orchestrator.rs:125is a timeout warning). This is taint-analyzer propagation through a boolean validator that uses.expose()internally; the logged bytes are field names, not secret material. How would the team like to resolve: (a)#[allow]on the validator with a comment explaining the false positive, (b) renamereject_placeholder_secrets→validate_secrets_not_placeholdersto break the taint-name heuristic, or (c) configure a CodeQL suppression?
Non-blocking
🤔 thinking
- Bot gate is presence-only (
ec/device.rs:76) — attackers with spoofed JA4 pass. Document threat model. (inline) user.id = Nonewhen EC not allowed (auction/endpoints.rs:60) — correct for consent, but may depress bidder dedup/yield. Consider a session-scoped ephemeral ID. (inline)- Pull-sync only on organic routes (
trusted-server-adapter-fastly/src/main.rs:373) — auction-heavy SPA publishers never refresh KV. Document or extend. (inline)
♻️ refactor
- Duplicated bearer parser in
ec/identify.rs:34andec/batch_sync.rs:101— extract toec/auth.rs. (inline) - Rate-limiter hourly→minute conversion is lossy by up to ~2× (
ec/rate_limiter.rs:59) — document and test the overshoot bound. (inline)
🌱 seedling
- Non-deterministic EID order on timestamp ties (
ec/eids.rs:64) — addpartner_idas secondary sort key. (inline) - No post-deserialize bounds on
KvEntry(ec/kv.rs:158) — legacy/corrupt oversized entries will round-trip through. (inline) platform/test_support.rsdefines stubs (NoopConfigStore,NoopSecretStore,noop_services) that no unit test exercises, so trait drift will silently break them. Either consume in ≥1 unit test or remove.- Auction ext metadata unsrialized:
auction/formats.rscomputes strategy/provider-count/timing intoresult.metadatabut never emits it intoext.trustedServeron the OpenRTB response — a useful debug surface.
⛏ nitpick
has_cookie_mismatchnaming (ec/mod.rs:398) — ambiguous; preferheader_overrides_cookie. (inline)
📌 out of scope
ec_id/client_ipas query parameters in pull sync — PR description acknowledges as follow-up.
CI Status
- cargo fmt: PASS
- clippy: PASS
- cargo test (workspace): PASS
- vitest: PASS
- format-typescript: FAIL (see inline)
- integration tests: FAIL — compile error (see inline)
- CodeQL: FAIL — 15 high (likely false positives) (see question above)
- browser integration tests: PASS
- format-docs: PASS
|
Addressed the remaining merge blockers in 4d2ef302:
Validation run locally:
|
|
Follow-up update in 6016c48c to address the remaining open review threads:
Validation re-run locally:
I also replied to and resolved the remaining open review threads tied to these changes. |
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
This PR lands the Edge Cookie identity system end to end, but I found two merge blockers in the request lifecycle.
Blocking
🔧 wrench
- Identify GET still missing browser-direct CORS support:
OPTIONS /_ts/api/v1/identifyreflects CORS headers, butGET /_ts/api/v1/identifynever applies them, so the documented browser-direct flow still fails after a successful preflight.classify_origin()also accepts matching hosts regardless of scheme, while the spec only allowshttpsorigins.
Fix: classify origin in the GET path too, return 403 for denied origins, reflect CORS headers on allowed GET responses, and require origin_url.scheme() == "https" in classify_origin(). (crates/trusted-server-core/src/ec/identify.rs:31, crates/trusted-server-core/src/ec/identify.rs:171)
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- browser integration tests: PASS
- integration tests: FAIL (
test_ec_lifecycle_fastly: organicGET /should setts-eccookie) - CodeQL: FAIL (12 open alerts on the merge ref)
aram356
left a comment
There was a problem hiding this comment.
Summary
Edge Cookie identity subsystem is materially complete and the prior review's blockers have been addressed: navigation-headers fix unblocks test_ec_lifecycle_fastly, UpsertResult variants are tested, user.id is Option<String> with skip_serializing_if, the partner-registry race is gone (architecture is now config-derived), MAX_CAS_RETRIES is 5 with backoff+jitter, raw client IP is no longer forwarded to pull-sync partners, and the WASM build now runs in CI.
Remaining blocking issues: a plaintext partner pull-token sitting on a Debug-deriving struct, an unintentional-looking read/write asymmetry in the bot gate, and a still-failing CodeQL gate riding the same Settings Debug taint-flow false positive that was flagged on the prior review.
Blocking
🔧 wrench
- Plaintext
ts_pull_tokenonPartnerConfig(crates/trusted-server-core/src/ec/registry.rs:44, :208)
❓ question
- Bot-gate read/write asymmetry intentional? (
crates/trusted-server-adapter-fastly/src/main.rs:208-212vs.:272-276) - CodeQL CI gate is failing on 12 new
rust/cleartext-loggingalerts, all riding thesettings.reject_placeholder_secrets()→log::debug!("Settings {settings:?}")taint flow that was flagged on the prior review.SettingsDebug is redacted viaRedacted<T>, so the alerts are benign — but they block the gate. Sites:crates/trusted-server-adapter-fastly/src/main.rs:71,crates/trusted-server-core/src/publisher.rs:178/:341/:382,crates/trusted-server-core/src/html_processor.rs:70,crates/trusted-server-core/src/consent/mod.rs:173,crates/trusted-server-core/src/auction/orchestrator.rs:125/:278/:338/:408,crates/trusted-server-core/src/integrations/nextjs/html_post_process.rs:123/:456. Plan to suppress at PR level or add a repo-wide CodeQL filter onRedacted-derived flows before merge?
Non-blocking
🤔 thinking
std::thread::sleepin CAS retry loop on wasm32-wasip1 (crates/trusted-server-core/src/ec/kv.rs:343, :426, :471, :558)seen_domainsstill usesHashMapwhileidswas converted toBTreeMap(crates/trusted-server-core/src/ec/kv_types.rs:133)MIN_PASSPHRASE_LENGTH = 8is weak entropy for the HMAC-SHA256 anchor (crates/trusted-server-core/src/settings.rs:353)seen_domainsdrop-newest eviction freezes long-lived ECs at the 50-domain cap (crates/trusted-server-core/src/ec/kv.rs:667-683) — repeat from prior review
♻️ refactor
PartnerRegistry::all()exposesHashMapiteration order to callers (crates/trusted-server-core/src/ec/registry.rs:174-176)RateLimiter::exceeded_per_minuteround-trips through hourly math (crates/trusted-server-core/src/ec/rate_limiter.rs:42-46, :49-52)
🌱 seedling
- No end-to-end integration tests for batch sync 429 / 400 / 413 paths (
crates/integration-tests/tests/frameworks/scenarios.rs:761-779) - Bot-gate observability hook (
crates/trusted-server-core/src/ec/device.rs:84-86) — operators have no metric for what fraction of traffic is excluded
📌 out of scope
- Repo-level CodeQL filter for the
Redacted-derivedrust/cleartext-loggingtaint flow so this stops gating PRs. Second consecutive PR review where the same false-positive flow blocks CI.
⛏ nitpick
derive(Debug)onPartnerRegistry/PartnerConfig(crates/trusted-server-core/src/ec/registry.rs:17, :53) — combined with the plaintext token wrench, this is the loaded gun.
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test (workspace): PASS
- vitest: PASS
- integration tests: PASS
- browser integration tests: PASS
- WASM release build: PASS
- CodeQL: FAIL — 12 new alerts, all on the same
SettingsDebug taint-flow false-positive flow (see ❓ above)
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Clearing an old pending local review so I can reply to the current threads from the branch update.
… module Origin/main introduced RuntimeServices platform abstraction (PR2-PR6) after our branch diverged. The rebase left inconsistencies between the trait definitions from our branch and the implementations from origin/main. Restore source files to their ORIG_HEAD state to reconcile: - Remove services: &RuntimeServices from IntegrationProxy::handle implementations (datadome, didomi, gpt, lockr, permutive, testlight, gtm) - Remove services from run_auction, proxy_request, AuctionContext - Restore publisher, proxy, orchestrator, types to branch-correct state - Restore streaming/html/rsc_flight modules to branch state
… sensitive information' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
… sensitive information' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
… sensitive information' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
f9d95fc to
0c542e3
Compare
|
@ChristianPavilonis Can you please resolve conflict before I continue with reviews? |
Summary
Changes
crates/trusted-server-core/src/ec/(new module)mod.rs(lifecycle orchestration),generation.rs(HMAC-SHA256),cookies.rs(cookie read/write),consent.rs(consent gating),device.rs(device signal derivation, bot gate),kv.rs+kv_types.rs(KV identity graph with CAS),partner.rs(partner registry + admin),sync_pixel.rs(pixel sync),batch_sync.rs(S2S batch sync),pull_sync.rs(background pull sync),identify.rs(identity lookup with CORS),eids.rs(EID encoding),finalize.rs(response finalization middleware),admin.rs(admin endpoints)crates/integration-tests/tests/common/ec.rsandtests/frameworks/scenarios.rs; updatedintegration.rstest runner and Viceroy config fixturescrates/trusted-server-adapter-fastly/src/main.rs/_ts/api/v1/and/_ts/admin/;send_to_client()pattern with background pull sync dispatchcrates/trusted-server-core/src/auction/endpoints.rsandformats.rsupdated to decorate bid requests with partner EIDs from the KV identity graph (user.id,user.eids,user.consent)crates/trusted-server-core/src/integrations/prebid.rscrates/js/lib/src/integrations/prebid/index.tscrates/trusted-server-core/src/synthetic.rscrates/trusted-server-core/src/cookies.rsec/cookies.rscrates/trusted-server-core/src/settings.rs+settings_data.rscrates/trusted-server-core/src/consent/crates/trusted-server-core/src/http_util.rscrates/trusted-server-core/src/proxy.rscrates/trusted-server-core/src/publisher.rscrates/trusted-server-core/src/constants.rsdocs/guide/ec-setup-guide.md(new)docs/guide/edge-cookies.md(new)docs/guide/api-reference.mddocs/guide/configuration.mddocs/guide/synthetic-ids.mddocs/(various)docs/internal/superpowers/specs/2026-03-24-ssc-technical-spec-design.mdfastly.tomltrusted-server.tomlCloses
Closes #532
Closes #533
Closes #534
Closes #535
Closes #536
Closes #537
Closes #538
Closes #539
Closes #540
Closes #541
Closes #542
Closes #543
Closes #544
Closes #611
Closes #612
Follow-up
Test plan
cargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatChecklist
unwrap()in production code — useexpect("should ...")tracingmacros (notprintln!)