CBL-8161 : Timeout when closing the database with an active MultipeerReplicator running#482
Merged
Conversation
…Replicator running (#470) Back ported from master branch (3fb6a7b) — Problems : Closing the database with an active MultipeerReplicator running failed with the timeout error. Three related issues contributed to this bug: 1. Incorrect CountDownLatch reset in AbstractDatabase.shutdown() : When close() returns a BUSY error, the latch was reset to 2 under the assumption that a new process would be found in verifyActiveProcesses(). This assumption is not always valid (e.g. a background thread is still accessing the database during close). As a result, the latch never reaches zero and close() is never retried correctly. 2. MultipeerReplicator.onSyncStatusChanged reports inactive early (fixed in EE repo) : MultipeerReplicator.onSyncStatusChanged marked the process offline immediately but deferred unregisterProcess to an async callback . As a result, AbstractDatabase could treat the process as stopped, attempt to close the database too early, and receive BUSY from LiteCore. (fixed in EE repo) 3. Deadlock when close() is called from the Android main thread (fixed in EE repo) : Waiting for all active processes to unregister blocks the main thread, while shutdownConflictResolverService is also scheduled on the main thread (default executor), causing a deadlock. Fixes Simplified the shutdown logic by splitting it into two independent phases, which also eliminates the problematic CountDownLatch reset: Phase 1 — Drain active processes: Shut down all active processes, then wait for them to finish or time out (10 secs). Phase 2 — Close the database: Attempt close(); if BUSY is returned, wait briefly (2 secs) and retry (max 5 retries). Previously, the two phases were interleaved in a single loop: shut down processes → wait → close → on BUSY, reset latch to 2 and repeat from the top (max 5 retries). This coupling was the root cause of the latch never reaching zero.
|
This is a release branch and commits are restricted. Please confirm this PR is one of the following:
|
Collaborator
Author
|
Fixed for a blocking-test issue reported by QE. |
borrrden
approved these changes
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Back ported from master branch (3fb6a7b)
—
Problems :
Closing the database with an active MultipeerReplicator running failed with the timeout error. Three related issues contributed to this bug:
Incorrect CountDownLatch reset in AbstractDatabase.shutdown() : When close() returns a BUSY error, the latch was reset to 2 under the assumption that a new process would be found in verifyActiveProcesses(). This assumption is not always valid (e.g. a background thread is still accessing the database during close). As a result, the latch never reaches zero and close() is never retried correctly.
MultipeerReplicator.onSyncStatusChanged reports inactive early (fixed in EE repo) : MultipeerReplicator.onSyncStatusChanged marked the process offline immediately but deferred unregisterProcess to an async callback . As a result, AbstractDatabase could treat the process as stopped, attempt to close the database too early, and receive BUSY from LiteCore. (fixed in EE repo)
Deadlock when close() is called from the Android main thread (fixed in EE repo) : Waiting for all active processes to unregister blocks the main thread, while shutdownConflictResolverService is also scheduled on the main thread (default executor), causing a deadlock.
Fixes
Simplified the shutdown logic by splitting it into two independent phases, which also eliminates the problematic CountDownLatch reset:
Phase 1 — Drain active processes: Shut down all active processes, then wait for them to finish or time out (10 secs).
Phase 2 — Close the database: Attempt close(); if BUSY is returned, wait briefly (2 secs) and retry (max 5 retries).
Previously, the two phases were interleaved in a single loop: shut down processes → wait → close → on BUSY, reset latch to 2 and repeat from the top (max 5 retries). This coupling was the root cause of the latch never reaching zero.