Skip to content

[fix] Fix ManagedLedgerImpl's pendingAddEntries leak.#25240

Open
zhaizhibo wants to merge 3 commits intoapache:masterfrom
zhaizhibo:fix_pendingAddEntries_leak
Open

[fix] Fix ManagedLedgerImpl's pendingAddEntries leak.#25240
zhaizhibo wants to merge 3 commits intoapache:masterfrom
zhaizhibo:fix_pendingAddEntries_leak

Conversation

@zhaizhibo
Copy link
Copy Markdown
Contributor

Motivation

There ard two scenarios where pendingAddEntries can leak in a ManagedLedger.

  1. Leak in Fenced state
    After an add operation, there may be a need to update the ledger's properties. If the ZooKeeper update fails, the callback sets the ManagedLedger state to fenced. Subsequent write operations that fail will trigger ManagedLedgerImpl#ledgerClosed. For a ManagedLedger in the fenced state, this can result in pendingAddEntries not being properly cleaned up, causing a leak.
  2. Leak in Closed state
    The ManagedLedger needs to rollover because the current ledger is full, switching its state to closedLedger or creatingLedger. New incoming add requests are appended to the pendingAddEntries queue but have not yet been initiated. If the ManagedLedger is closed (state set to closed) after the rollover is executed but before its callback returns, the requests in the pendingAddEntries queue will never be completed, resulting in a leak.

Modifications

  1. add a new function clearNotInitiatedPendingAddEntries(), which will be called after ManagedLedger state be set to closed or fenced. This method will immediately fail all OpAddEntry operations in the pendingAddEntries queue that have not yet been initiated and remove them from the queue.
  2. In the OpAddEntry#handleAddFail method, add a check for the fenced state. If the ManagedLedger is fenced, we should also clear all subsequent pending write requests in the queue to prevent leaks.

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 11, 2026
@zhaizhibo zhaizhibo changed the title [fix] fix ManagedLedgerImpl's pendingAddEntries leak. [fix] Fix ManagedLedgerImpl's pendingAddEntries leak. Feb 11, 2026
@lhotari lhotari requested a review from Copilot February 11, 2026 12:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses pendingAddEntries leaks in ManagedLedgerImpl when the managed ledger transitions to Closed or Fenced while there are queued add operations that were never initiated.

Changes:

  • Add OpAddEntry.closeIfNotInitiated() and use it to fail/remove queued-but-not-initiated add operations on close/fence.
  • Schedule the post-rollover metadata-update completion to run on the managed ledger executor thread.
  • Clear pending writes when ledgerClosed() is invoked while in Fenced state.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java Adds a helper to atomically close an OpAddEntry if it has not been initiated.
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java Adds selective clearing of not-initiated pending adds on close/fence and adjusts rollover callback execution/threading and fenced handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zhaizhibo
Copy link
Copy Markdown
Contributor Author

Pull request overview

This PR addresses pendingAddEntries leaks in ManagedLedgerImpl when the managed ledger transitions to Closed or Fenced while there are queued add operations that were never initiated.

Changes:

  • Add OpAddEntry.closeIfNotInitiated() and use it to fail/remove queued-but-not-initiated add operations on close/fence.
  • Schedule the post-rollover metadata-update completion to run on the managed ledger executor thread.
  • Clear pending writes when ledgerClosed() is invoked while in Fenced state.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java Adds a helper to atomically close an OpAddEntry if it has not been initiated.
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java Adds selective clearing of not-initiated pending adds on close/fence and adjusts rollover callback execution/threading and fenced handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Fixed

@zhaizhibo
Copy link
Copy Markdown
Contributor Author

add a test to reproduce the scenario where the ledger is closed before rollover completes(maybe ZooKeeper operations are slow).

@zhaizhibo
Copy link
Copy Markdown
Contributor Author

add a test to reproduce the scenario where the ledger is fenced before OpAddEntry.addcomplete().

@zhaizhibo zhaizhibo force-pushed the fix_pendingAddEntries_leak branch from e5a8dbd to 5eda9de Compare February 13, 2026 02:38
Copy link
Copy Markdown
Contributor

@Denovo1998 Denovo1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a unit test can be added:
Close occurs after updateLedgersListAfterRollover() has acquired the metadataMutex, but before operationComplete() has executed or just about to execute.

@zhaizhibo
Copy link
Copy Markdown
Contributor Author

@Denovo1998 Thanks for you review, I will fix the commets soon.

Copy link
Copy Markdown
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. The two leak scenarios identified in the PR description are real and well-described. However, I found a few issues that should be addressed before merging. See inline comments.

@codelipenghui codelipenghui added this to the 4.2.0 milestone Mar 5, 2026
@zhaizhibo zhaizhibo force-pushed the fix_pendingAddEntries_leak branch from 660ba2a to 118b793 Compare April 7, 2026 12:19
@zhaizhibo
Copy link
Copy Markdown
Contributor Author

Thanks for the fix. The two leak scenarios identified in the PR description are real and well-described. However, I found a few issues that should be addressed before merging. See inline comments.

@codelipenghui Thanks for the review. All issues mentioned in the comments have been fixed. Please take another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants