Fix compactor concurrency limit not enforced across compaction levels#7303
Fix compactor concurrency limit not enforced across compaction levels#7303adamalexandru4 wants to merge 1 commit intocortexproject:masterfrom
Conversation
72a39f1 to
352b8db
Compare
|
Hey @adamalexandru4, can you fix changelog and lint? |
|
I'll handle ASAP. |
Signed-off-by: Alexandru Adam <aadam@adobe.com>
4f3f454 to
f8995cb
Compare
|
Umm, I don't know if it is a real bug. We run partitioning compactor in prod for several years and I am pretty sure that it runs 1 job at most with concurrency set to 1. So it should block and wait for the current jobs to finish before compacting more.
I got this example. But compactor should stop and wait after each iteration |
What this PR does
BucketCompactor.Compact()has aforloop that repeatedly callsgrouper.Groups()after each successful compaction (shouldRerun=true). The grouper returns up tocompactionConcurrencygroups per call, but since the loop calls it multiple times, the effective limit is unbounded.With
concurrency=1and blocks[6×2h, 2×12h]:Groups()→ 1 group (6×2h → 12h), compacts,shouldRerun=trueGroups()→ 1 group (3×12h → 24h), compacts,shouldRerun=trueGroups()→ 0 groups, loop breaksResult: 2 compactions in one pass despite
concurrency=1. Each downloads blocks to local disk, causing unexpected disk usage spikes.Fix
Track cumulative groups returned across
Groups()calls within oneCompact()invocation. Once the total reachescompactionConcurrency, return empty:This is safe because the grouper is created fresh in each
compactUser()call, sototalGroupsPlannedresets every pass.The fix currently covers:
ShuffleShardingGrouper(sharding_strategy: shuffle-sharding)PartitionCompactionGrouper(sharding_strategy: shuffle-sharding+compaction_strategy: partitioning)Gap: The
DefaultBlocksGrouperFactorypath (used whensharding_strategy: defaultor sharding is disabled) delegates to the ThanosDefaultGrouper, which is not patched. This affects single-tenant / non-sharded setups. Plan is to wrap it in a concurrency-limiting adapter.Tests
ShuffleShardingGrouperandPartitionCompactionGrouper: callGroups()twice withconcurrency=1, assert second call returns 0 groupsWhich issue(s) this PR fixes:
Fixes [#7298]
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]