Fix flaky TestBlocksCleaner by awaiting HeartBeat goroutine completion#7386
Fix flaky TestBlocksCleaner by awaiting HeartBeat goroutine completion#7386archy-rock3t-cloud wants to merge 1 commit intocortexproject:masterfrom
Conversation
Signed-off-by: Artem Muterko <artem@sopho.tech>
CharlieTLe
left a comment
There was a problem hiding this comment.
Review
Overview
This PR fixes a goroutine leak / race in BlocksCleaner. The HeartBeat goroutine is launched to periodically update visit markers, and when the parent function returns, it signals the goroutine to stop via errChan <- nil. However, HeartBeat still does meaningful work after receiving from errChan — it calls MarkWithStatus(ctx, Completed), logs, and (since deleteOnExit=true) calls DeleteVisitMarker. Without this fix, the parent function could return and the test could tear down resources while HeartBeat is still accessing them, causing flaky failures.
The fix adds a doneChan that is closed when HeartBeat returns, and the defer waits on <-doneChan after sending the nil error. This ensures the goroutine has fully completed before the function returns.
Analysis
Correctness: The fix is correct. Both operations (errChan <- nil then <-doneChan) are in the same defer closure, so they execute sequentially in the right order. The errChan is buffered (size 1), so the send is non-blocking even if HeartBeat already exited via ctx.Done(). In that case, doneChan is already closed, so <-doneChan returns immediately. All code paths are safe.
Scope: Minimal and focused — only the two call sites in cleanUpActiveUsers and cleanDeletedUsers are changed, with identical fixes.
Looks good to me. Two minor suggestions:
- CHANGELOG: The PR checklist shows CHANGELOG as not updated. Consider adding a
[BUGFIX]entry — this addresses a real goroutine leak (not just a test issue). TheHeartBeatgoroutine could outlive its parent and access stale resources in production too. - Optional refactor: See inline comment about extracting the duplicated pattern into a helper.
| defer func() { | ||
| errChan <- nil | ||
| <-doneChan | ||
| }() |
There was a problem hiding this comment.
Nit: This exact pattern is duplicated verbatim in cleanDeletedUsers below. Consider extracting a small helper to reduce the duplication, e.g.:
func runWithHeartBeat(ctx context.Context, mgr *VisitMarkerManager, interval time.Duration, fn func() error) error {
errChan := make(chan error, 1)
doneChan := make(chan struct{})
go func() {
mgr.HeartBeat(ctx, errChan, interval, true)
close(doneChan)
}()
defer func() {
errChan <- nil
<-doneChan
}()
return fn()
}This is optional — the duplication is small and the current code is clear.
What this PR does:
Fixes flaky
TestBlocksCleanerby ensuring theHeartBeatgoroutine completes before the function returns.Which issue(s) this PR fixes:
Fixes flaky
TestBlocksCleanerCI failures.Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]