Skip to content

CBL-8160 : Timeout when closing the database with an active MultipeerReplicator running#483

Merged
pasin merged 1 commit into
release/3.3from
CBL-8260
May 15, 2026
Merged

CBL-8160 : Timeout when closing the database with an active MultipeerReplicator running#483
pasin merged 1 commit into
release/3.3from
CBL-8260

Conversation

@pasin
Copy link
Copy Markdown
Collaborator

@pasin pasin commented May 15, 2026

Back ported from master branch (3fb6a7b)

Problems :

Closing the database with an active MultipeerReplicator running failed with the timeout error. Three related issues contributed to this bug:

  1. Incorrect CountDownLatch reset in AbstractDatabase.shutdown() : When close() returns a BUSY error, the latch was reset to 2 under the assumption that a new process would be found in verifyActiveProcesses(). This assumption is not always valid (e.g. a background thread is still accessing the database during close). As a result, the latch never reaches zero and close() is never retried correctly.

  2. MultipeerReplicator.onSyncStatusChanged reports inactive early (fixed in EE repo) : MultipeerReplicator.onSyncStatusChanged marked the process offline immediately but deferred unregisterProcess to an async callback . As a result, AbstractDatabase could treat the process as stopped, attempt to close the database too early, and receive BUSY from LiteCore. (fixed in EE repo)

  3. Deadlock when close() is called from the Android main thread (fixed in EE repo) : Waiting for all active processes to unregister blocks the main thread, while shutdownConflictResolverService is also scheduled on the main thread (default executor), causing a deadlock.

Fixes

Simplified the shutdown logic by splitting it into two independent phases, which also eliminates the problematic CountDownLatch reset:

Phase 1 — Drain active processes: Shut down all active processes, then wait for them to finish or time out (10 secs).

Phase 2 — Close the database: Attempt close(); if BUSY is returned, wait briefly (2 secs) and retry (max 5 retries).

Previously, the two phases were interleaved in a single loop: shut down processes → wait → close → on BUSY, reset latch to 2 and repeat from the top (max 5 retries). This coupling was the root cause of the latch never reaching zero.

…Replicator running (#470)

Back ported from master branch (3fb6a7b)

—

Problems :

Closing the database with an active MultipeerReplicator running failed with the timeout error. Three related issues contributed to this bug:

1. Incorrect CountDownLatch reset in AbstractDatabase.shutdown() : When close() returns a BUSY error, the latch was reset to 2 under the assumption that a new process would be found in verifyActiveProcesses(). This assumption is not always valid (e.g. a background thread is still accessing the database during close). As a result, the latch never reaches zero and close() is never retried correctly.

2. MultipeerReplicator.onSyncStatusChanged reports inactive early (fixed in EE repo) : MultipeerReplicator.onSyncStatusChanged marked the process offline immediately but deferred unregisterProcess to an async callback . As a result, AbstractDatabase could treat the process as stopped, attempt to close the database too early, and receive BUSY from LiteCore. (fixed in EE repo)

3. Deadlock when close() is called from the Android main thread (fixed in EE repo) : Waiting for all active processes to unregister blocks the main thread, while shutdownConflictResolverService is also scheduled on the main thread (default executor), causing a deadlock.

Fixes

Simplified the shutdown logic by splitting it into two independent phases, which also eliminates the problematic CountDownLatch reset:

Phase 1 — Drain active processes: Shut down all active processes, then wait for them to finish or time out (10 secs).

Phase 2 — Close the database: Attempt close(); if BUSY is returned, wait briefly (2 secs) and retry (max 5 retries).

Previously, the two phases were interleaved in a single loop: shut down processes → wait → close → on BUSY, reset latch to 2 and repeat from the top (max 5 retries). This coupling was the root cause of the latch never reaching zero.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

This is a release branch and commits are restricted.

Please confirm this PR is one of the following:

  • A response to a customer ask
  • A change per our security policy
  • A non-functional change (i.e. changes needed for building an older version)
  • A change that has been granted an exception (please comment)

@pasin
Copy link
Copy Markdown
Collaborator Author

pasin commented May 15, 2026

Fix for a blocking-test ticket reported by QE.

@pasin pasin requested review from borrrden and jianminzhao May 15, 2026 22:25
@pasin pasin merged commit a45e701 into release/3.3 May 15, 2026
1 check passed
@pasin pasin deleted the CBL-8260 branch May 15, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants