CBL-8161 : Timeout when closing the database with an active MultipeerReplicator running by pasin · Pull Request #482 · couchbase/couchbase-lite-java-common

pasin · 2026-05-15T21:53:43Z

Back ported from master branch (3fb6a7b)

—

Problems :

Closing the database with an active MultipeerReplicator running failed with the timeout error. Three related issues contributed to this bug:

Incorrect CountDownLatch reset in AbstractDatabase.shutdown() : When close() returns a BUSY error, the latch was reset to 2 under the assumption that a new process would be found in verifyActiveProcesses(). This assumption is not always valid (e.g. a background thread is still accessing the database during close). As a result, the latch never reaches zero and close() is never retried correctly.
MultipeerReplicator.onSyncStatusChanged reports inactive early (fixed in EE repo) : MultipeerReplicator.onSyncStatusChanged marked the process offline immediately but deferred unregisterProcess to an async callback . As a result, AbstractDatabase could treat the process as stopped, attempt to close the database too early, and receive BUSY from LiteCore. (fixed in EE repo)
Deadlock when close() is called from the Android main thread (fixed in EE repo) : Waiting for all active processes to unregister blocks the main thread, while shutdownConflictResolverService is also scheduled on the main thread (default executor), causing a deadlock.

Fixes

Simplified the shutdown logic by splitting it into two independent phases, which also eliminates the problematic CountDownLatch reset:

Phase 1 — Drain active processes: Shut down all active processes, then wait for them to finish or time out (10 secs).

Phase 2 — Close the database: Attempt close(); if BUSY is returned, wait briefly (2 secs) and retry (max 5 retries).

Previously, the two phases were interleaved in a single loop: shut down processes → wait → close → on BUSY, reset latch to 2 and repeat from the top (max 5 retries). This coupling was the root cause of the latch never reaching zero.

…Replicator running (#470) Back ported from master branch (3fb6a7b) — Problems : Closing the database with an active MultipeerReplicator running failed with the timeout error. Three related issues contributed to this bug: 1. Incorrect CountDownLatch reset in AbstractDatabase.shutdown() : When close() returns a BUSY error, the latch was reset to 2 under the assumption that a new process would be found in verifyActiveProcesses(). This assumption is not always valid (e.g. a background thread is still accessing the database during close). As a result, the latch never reaches zero and close() is never retried correctly. 2. MultipeerReplicator.onSyncStatusChanged reports inactive early (fixed in EE repo) : MultipeerReplicator.onSyncStatusChanged marked the process offline immediately but deferred unregisterProcess to an async callback . As a result, AbstractDatabase could treat the process as stopped, attempt to close the database too early, and receive BUSY from LiteCore. (fixed in EE repo) 3. Deadlock when close() is called from the Android main thread (fixed in EE repo) : Waiting for all active processes to unregister blocks the main thread, while shutdownConflictResolverService is also scheduled on the main thread (default executor), causing a deadlock. Fixes Simplified the shutdown logic by splitting it into two independent phases, which also eliminates the problematic CountDownLatch reset: Phase 1 — Drain active processes: Shut down all active processes, then wait for them to finish or time out (10 secs). Phase 2 — Close the database: Attempt close(); if BUSY is returned, wait briefly (2 secs) and retry (max 5 retries). Previously, the two phases were interleaved in a single loop: shut down processes → wait → close → on BUSY, reset latch to 2 and repeat from the top (max 5 retries). This coupling was the root cause of the latch never reaching zero.

github-actions · 2026-05-15T21:53:53Z

This is a release branch and commits are restricted.

Please confirm this PR is one of the following:

A response to a customer ask
A change per our security policy
A non-functional change (i.e. changes needed for building an older version)
A change that has been granted an exception (please comment)

pasin · 2026-05-15T21:54:56Z

Fixed for a blocking-test issue reported by QE.

pasin requested review from borrrden and jianminzhao and removed request for borrrden May 15, 2026 21:56

borrrden approved these changes May 15, 2026

View reviewed changes

pasin merged commit 50a52cd into release/4.0 May 15, 2026
1 check passed

pasin deleted the CBL-8261 branch May 15, 2026 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CBL-8161 : Timeout when closing the database with an active MultipeerReplicator running#482

CBL-8161 : Timeout when closing the database with an active MultipeerReplicator running#482
pasin merged 1 commit into
release/4.0from
CBL-8261

pasin commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 •

edited by pasin

Loading

Uh oh!

pasin commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pasin commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 • edited by pasin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pasin commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 15, 2026 •

edited by pasin

Loading