Skip to content

Fix flaky TestBlocksCleaner by awaiting HeartBeat goroutine completion#7386

Open
archy-rock3t-cloud wants to merge 1 commit intocortexproject:masterfrom
sophotechlabs:fix/flaky-blocks-cleaner-heartbeat
Open

Fix flaky TestBlocksCleaner by awaiting HeartBeat goroutine completion#7386
archy-rock3t-cloud wants to merge 1 commit intocortexproject:masterfrom
sophotechlabs:fix/flaky-blocks-cleaner-heartbeat

Conversation

@archy-rock3t-cloud
Copy link
Copy Markdown
Contributor

What this PR does:
Fixes flaky TestBlocksCleaner by ensuring the HeartBeat goroutine completes before the function returns.

Which issue(s) this PR fixes:
Fixes flaky TestBlocksCleaner CI failures.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Artem Muterko <artem@sopho.tech>
Copy link
Copy Markdown
Member

@CharlieTLe CharlieTLe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Overview

This PR fixes a goroutine leak / race in BlocksCleaner. The HeartBeat goroutine is launched to periodically update visit markers, and when the parent function returns, it signals the goroutine to stop via errChan <- nil. However, HeartBeat still does meaningful work after receiving from errChan — it calls MarkWithStatus(ctx, Completed), logs, and (since deleteOnExit=true) calls DeleteVisitMarker. Without this fix, the parent function could return and the test could tear down resources while HeartBeat is still accessing them, causing flaky failures.

The fix adds a doneChan that is closed when HeartBeat returns, and the defer waits on <-doneChan after sending the nil error. This ensures the goroutine has fully completed before the function returns.

Analysis

Correctness: The fix is correct. Both operations (errChan <- nil then <-doneChan) are in the same defer closure, so they execute sequentially in the right order. The errChan is buffered (size 1), so the send is non-blocking even if HeartBeat already exited via ctx.Done(). In that case, doneChan is already closed, so <-doneChan returns immediately. All code paths are safe.

Scope: Minimal and focused — only the two call sites in cleanUpActiveUsers and cleanDeletedUsers are changed, with identical fixes.

Looks good to me. Two minor suggestions:

  1. CHANGELOG: The PR checklist shows CHANGELOG as not updated. Consider adding a [BUGFIX] entry — this addresses a real goroutine leak (not just a test issue). The HeartBeat goroutine could outlive its parent and access stale resources in production too.
  2. Optional refactor: See inline comment about extracting the duplicated pattern into a helper.

defer func() {
errChan <- nil
<-doneChan
}()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This exact pattern is duplicated verbatim in cleanDeletedUsers below. Consider extracting a small helper to reduce the duplication, e.g.:

func runWithHeartBeat(ctx context.Context, mgr *VisitMarkerManager, interval time.Duration, fn func() error) error {
    errChan := make(chan error, 1)
    doneChan := make(chan struct{})
    go func() {
        mgr.HeartBeat(ctx, errChan, interval, true)
        close(doneChan)
    }()
    defer func() {
        errChan <- nil
        <-doneChan
    }()
    return fn()
}

This is optional — the duplication is small and the current code is clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants