cmd/compile/internal,internal,runtime: add lightweight probabilistic data race detection by VladSaiocUber · Pull Request #77793 · golang/go

VladSaiocUber · 2026-02-25T14:20:53Z

DO NOT REVIEW (yet)

Expanded upon lightweight probabilistic data race detection (Racelite) https://go-review.googlesource.com/c/go/+/718640

…robabilistic race detection This is a runnable but rough prototype. DO NOT REVIEW. Earlier today, Milind Chabbi described adding micropauses to detect concurrent memory accesses via hardware debug registers and PMUs. This CL is inspired by those micropauses, but instead attempts software cheap probabilistic race detection via just the Go compiler & runtime. Sample report: racelite: multiple concurrent writes to same object detected racelite: write flag was changed for addr 0x17578e4f6230 at word 2 of object at base 0x17578e4f6220 panic: racelite: multiple concurrent writes to same object detecte This CL updates the compiler for instrument writes (and eventually reads) with some new runtime support for software-based best effort race detection. That report was generated by running this program after building it with the prototype: type Foo struct{ a, b, c int } p := &Foo{} // Do two racy writes. var wg sync.WaitGroup for range 2 { wg.Go(func() { for range 10000 { // Without a mutex, both goroutines have a data race here. // We detect the race in ~1% of these short program executions // (with higher or lower detection percentages depending how we // adjust our sampling parameters). p.c = 2 } }) } The high-level approach is that we attempt to have a global agreement across all user-goroutines about which words are being monitored at any given time (in the prototype, currently one out of 1024 addresses at any moment, but that is tunable based on overhead vs. detection rates), and then have a scheme to move from a word being written to a location we can record that it is being written. Given there is global agreement on how to do that, two distinct goroutines can both record that they are writing to the same word of an object and throw a best effort fatal error, similar to how the runtime maps can throw a best effort fatal error on concurrent map writes. (The maps have an advantage that there is a convenient struct to store the 'writing' field; we do not have that for arbitrary heap objects, so hence the scheme described in this CL). In addition, that global agreement is not fixed, but rather updated overtime so that a continually different subset of addresses can be monitored over the lifetime of a program. In particular, we update the global agreement during STW so that the agreement can change coherently across all user goroutines. Some additional details about the prototype: * We have two levels of sampling to reduce overhead. * There are two random 32-bit numbers initialized at program start. * When a write occurs, we use a mask of the address to compare against one of the random numbers to decide whether to do a the work of finding the heap object's bounds. * If that happens, we then use the second 32-bit random number to decide if we are monitoring the accessed word of the object. Each heap object can only have one word monitored at a time. This allows the storage location for the monitoring state can be per object without conflict between words of the same object. (Currently, this is a footer added to the heap object, but perhaps could be per span instead, or a different scheme). * These two random numbers are updated every STW. This is a convenient moment in time for us, thanks to the GC. * Writes XOR 1 into the monitored object's footer. The same bit is written regardless of which word in the object is being written. * To help with detection, it also does a microsleep for currently 1 out of every 1M writes. * The prototype currently only detects concurrent writes, but it is hopefully a small step to add in reads as well. This is very much just a first cut. All the numbers were chosen for now with the desire to see something work via short/quick tests, and are not necessarily reflective of what real numbers should be. Returning to our sample program, if we bracket the racy write with a mu.Lock and mu.Unlock, no race is reported, as expected. Because we are tracking each word of each allocated object separately, writing to separate words of the same heap object does not report a race either, also as desired: wg.Go(func() { for range 100000 { // Not a race because this goroutine is only writing to field b. p.b = 1 } }) wg.Go(func() { for range 100000 { // Not a race because this goroutine is only writing to field c. p.c = 1 } }) Some possible refinements: For simplicity, that mask check on the address happens in the runtime in my quick prototype, but I suspect it would not be too bad to instead be a small amount of code emited by the runtime (do the mask on the addr, check the result, and do nothing the large majority of the time; rarely, the mask check would cause it to call into the runtime.) Some ideas for reducing the memory overhead are that rather than having a word per heap object (which might be something like 5-10% memory overhead for a typical app), it could be one word per span, perhaps stored on the mspan struct or in the foot of the span, which then might be more something like 0.1% memory overhead (e.g., 8 bytes for an 8KiB span). Some ideas for reducing the CPU overhead could be that the instrumentation in the compiler itself could be probabalisitic. For example, if each location had a 1/8 chance of being instrumented based on some randomization flag, then the CPU overhead could likely drop proportionally at the cost then of needing to deploy multiple binaries to get a mix of instrumentation sites. Another approach might be a simpler boolean check in the instrumentation site, which could then be enabled globally only some percentage of the time (say, 1% or 5%), where checking the boolean might place it in the ball park of the current disabled write barrier check, which is considered small. Or perhaps the instrumentation could be behind the write barrier check, so it only checks when the write barrier is enabled but has ~0 additional cost when the write barrier is disabled. Change-Id: Id95f5a3db1f1b56f13362f2529b0887ddcbe6971

…head of time.

…ter.

… stack, S2 is second temporal stack)

gopherbot · 2026-02-25T14:32:08Z

This PR (HEAD: ec63211) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/748980.

Important tips:

Don't comment on this PR. All discussion takes place in Gerrit.
You need a Gmail or other Google account to log in to Gerrit.
To change your code in response to feedback:
- Push a new commit to the branch used by your GitHub PR.
- A new "patch set" will then appear in Gerrit.
- Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
- Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
- Multiple commits in the PR will be squashed by GerritBot.
The title and description of the GitHub PR are used to construct the final commit message.
- Edit these as needed via the GitHub web interface (not via Gerrit or git).
- You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

gopherbot · 2026-02-25T15:05:49Z

Message from Gopher Robot:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/748980.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2026-02-25T15:05:50Z

Message from Georgian-Vlad Saioc:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/748980.
After addressing review feedback, remember to publish your drafts!

In some misguided attempt at "cleanup", Google Cloud has decided to retire 'gsutil' in favor of 'gcloud storage' instead of leaving an entirely backwards-compatible wrapper so that client scripts and muscle memory keep working. In addition to breaking customers this way, they are also sending AI bots around "cleaning up" old usages with scary warnings that maybe the changes will break your entire world. This is even more misguided, of course, and resulted in us receiving CL 748820 (originally GitHub PR golang#77787) and then me receiving a private email asking for it to be merged. It was easier to recreate the 3-line CL myself than to enumerate everything that was wrong with that CL's commit message. I hope that only Google teams are being subjected to this. Fixes golang#77787. Change-Id: I624ad4aff7f488b191e195b72e564dbd78440883 Reviewed-on: https://go-review.googlesource.com/c/go/+/748900 TryBot-Bypass: Russ Cox <rsc@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Jonathan Amsterdam <jba@google.com>

gopherbot · 2026-02-26T12:38:57Z

This PR (HEAD: 0d44bf5) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/748980.

Important tips:

Don't comment on this PR. All discussion takes place in Gerrit.
You need a Gmail or other Google account to log in to Gerrit.
To change your code in response to feedback:
- Push a new commit to the branch used by your GitHub PR.
- A new "patch set" will then appear in Gerrit.
- Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
- Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
- Multiple commits in the PR will be squashed by GerritBot.
The title and description of the GitHub PR are used to construct the final commit message.
- Edit these as needed via the GitHub web interface (not via Gerrit or git).
- You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

thepudds and others added 30 commits February 25, 2026 14:47

Racelite read-write race detection and some refactoring.

4a56d10

Stop the world when reporting races to capture uninterfered stack.

bcf41e8

Fixed some comments.

470a1fe

WIP Virtualzed registers.

49bc4a1

Refactored virtual registers into a simpler design.

0ef4f71

Refactored comments and short-circuited read-write race

4b0160a

Updated some comments.

065a055

Updated virtual register values to non-atomic and capture writer gp a…

99682dd

…head of time.

Prototype multiple virtual registers

04c71a8

Reformatted data race report to mirror TSan.

95efe4d

Switched to custom mutual exclusion.

3f004eb

Use a pointer to array instead of array of pointers for virtual regis…

b8d62f7

…ter.

Fixed botched params in write-write report call.

305314f

Fixed param typo.

8cbb471

Removed unnecessary racelite flag check.

aca4acb

Fixed incorrect start-the-world in write-write race.

0b77ef4

Moved racelite random address refresher to sysmon

0d1152e

Experimental data race record deduplication.

1fe8838

First iteration data race deduplication.

0f07614

Fixed not correctly incrementing the count on race records.

7d73431

Racelite only instruments packages specified by Racelite pattern.

9484a50

Added racelite locks.

0d37a79

Reversed stack order in data race report struct (S1 is first temporal…

b257343

… stack, S2 is second temporal stack)

Consistent ordering of PCs when saving data races

9a9a313

Removed some constants.

56a0b82

Disarm race instrumentation if a race is detected too often.

7d12612

Standardized Racelite constant names.

8d2a309

Reordered some constants

7519378

Racelite records contention over individual record slots.

ae4cb6f

VladSaiocUber added 8 commits February 25, 2026 14:47

Disarm after detecting race.

2fb5c73

Removed racelite disarm threshold

0e5ee89

Re-added racelite recorded race list.

8b97ef1

Hitched refresh to sysmon

07bddd4

Racelite renaming of sampler.

7f57601

Sampling dependent on Racelite mode.

3718f6e

Racelite temperature for instrumented PCs.

a5b0aaa

Updated name.

ec63211

rsc and others added 2 commits February 26, 2026 13:33

Fixed some comments and reduced randomness on micropause

0d44bf5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile/internal,internal,runtime: add lightweight probabilistic data race detection#77793

cmd/compile/internal,internal,runtime: add lightweight probabilistic data race detection#77793
VladSaiocUber wants to merge 40 commits intogolang:masterfrom
VladSaiocUber:dev.sw-race

VladSaiocUber commented Feb 25, 2026

Uh oh!

gopherbot commented Feb 25, 2026

Uh oh!

gopherbot commented Feb 25, 2026

Uh oh!

gopherbot commented Feb 25, 2026

Uh oh!

gopherbot commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

VladSaiocUber commented Feb 25, 2026

Uh oh!

gopherbot commented Feb 25, 2026

Uh oh!

gopherbot commented Feb 25, 2026

Uh oh!

gopherbot commented Feb 25, 2026

Uh oh!

gopherbot commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants