fix(tsan): atomic reads in findCallTrace to eliminate data race#559
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a TSan-reported race in CallTraceHashTable::findCallTrace by making reads from concurrently modified hash-table slots atomic, and adds a stress regression test intended to reproduce the expansion/prev-table race.
Changes:
- Uses atomic acquire loads for
keys[slot]infindCallTrace. - Uses
acquireTrace()and handlesPREPARINGslots as not found. - Adds a concurrent stress test targeting the expansion race scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
ddprof-lib/src/main/cpp/callTraceHashTable.cpp |
Updates previous-table lookup to use atomic key and trace reads. |
ddprof-lib/src/test/cpp/stress_callTraceStorage.cpp |
Adds a TSan regression stress test for concurrent expansion and lookup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CI Test ResultsRun: #26630635589 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-05-29 10:12:29 UTC |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 475981b613
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
What does this PR do?:
Replaces two plain (non-atomic) reads of
keys[slot]and one plain.tracefield access inCallTraceHashTable::findCallTracewith proper atomic loads, and adds a regression test that exercises the race scenario.Motivation:
TSan on aarch64 reported a data race between:
findCallTracereadingkeys[slot]with a plain load (line 235)putwritingkeys[slot]with an atomic CAS (line 319)The race is real: when a table is promoted to
prevafter expansion, threads that loaded the old table pointer before the CAS swap continue inserting into it atomically, while threads working on the new table callfindCallTrace(prev, hash)with non-atomic reads. On weakly-ordered architectures such as aarch64 this is observable.Two fixes in
findCallTrace:keys[slot]reads →__atomic_load_n(&keys[slot], __ATOMIC_ACQUIRE)(pairs with the ACQ_REL CAS input)values()[slot].traceplain access →values()[slot].acquireTrace()(pairs with the RELEASE store insetTrace). If the slot is stillPREPARING, treat it as not found so callers create a fresh entry.Additional Notes:
The
PREPARINGsentinel is a non-null pointer; returningnullptrfor an in-progress slot means the caller (put) creates a newCallTracefor that hash. Duplicate entries across the old and new table for the same hash are benign becausecollect()deduplicates by inserting into astd::unordered_set.How to test the change?:
New regression test
FindCallTraceAtomicReadRaceTestinstress_callTraceStorage.cpp:Run with TSan enabled:
For Datadog employees: