HDDS-14800. Guard RocksDB iterator against closed DB during volume failure by priyeshkaratha · Pull Request #9904 · apache/ozone

priyeshkaratha · 2026-03-11T04:21:23Z

What changes were proposed in this pull request?

When StorageVolumeChecker detects a volume failure and calls failVolume(), it closes the underlying RocksDB instance while BackgroundContainerDataScanner or OnDemandContainerScanner may still hold an active iterator over that DB, calling native RocksDB methods on a closed DB can cause a crash. This fix adds two complementary guards to RDBStoreAbstractIterator: a fast-fail check so hasNext() returns false immediately once the DB is closed (stopping the scan loop without touching native code), and reference counting by acquiring a slot on RocksDatabase.counter at iterator creation and releasing it on close(), so the existing waitAndClose() mechanism waits for all iterators to finish before physically closing the DB. Together these ensure the scan exits cleanly and the DB cannot be destroyed while an iterator is still in use.

What is the link to the Apache JIRA

HDDS-14800

How was this patch tested?

Tested using added unit test cases.

…ilure

ChenSammi · 2026-03-11T04:55:18Z

@priyeshkaratha , could you add a unit test for BackgroundContainerDataScanner?

...ainer-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractRDBStore.java

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java

...framework/src/test/java/org/apache/hadoop/hdds/utils/db/TestRDBStoreIteratorWithDBClose.java

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java

Gargi-jais11 · 2026-03-13T06:39:30Z

Let's add some test for the race condition scenarios.

devmadhuu

@priyeshkaratha overall changes LGTM +1. Just few nits. Pls check.

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RocksDatabase.java

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java

priyeshkaratha · 2026-03-16T12:24:32Z

Thanks @devmadhuu for the review. Addressed all your comments in latest patch.

devmadhuu

Thanks @priyeshkaratha for improving the patch. Just one minor comment observed. Pls check.

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java

devmadhuu

Thanks @priyeshkaratha. Now patch LGTM +1

yandrey321

lgtm

Gargi-jais11

Thanks @priyeshkaratha for updating the patch. LGTM.

errose28

Overall LGTM but Cursor has a sharper eye. This bug it found looks legitimate, as does the recommended fix:

While the RDBStoreAbstractIterator holds a lock for its lifecycle, there is a Time-Of-Check to Time-Of-Use (TOCTOU) gap when instantiating it inside RDBTable.
In RDBTable.java, the iterator is constructed like this:

return new RDBStoreByteArrayIterator(db.newIterator(family, false), this, prefix, type);

Here is how the race condition manifests:

db.newIterator(...) is called. Inside RocksDatabase, this method opens a try-with-resources block, calls acquire(), creates the native RocksDB iterator, and drops the lock upon returning.
[RACE WINDOW] The counter is now 0. If a concurrent failVolume() triggers waitAndClose(), it succeeds because counter is 0, and the DB is immediately destroyed natively.
The constructor new RDBStoreByteArrayIterator(...) is now called with the created (but now orphaned) native iterator.
The constructor attempts table.acquireIterator(), which fails because the DB is closed.
In the catch block, it calls iterator.close() to avoid a leak. However, because the underlying database was already physically destroyed in Step 2, calling close() on the native iterator triggers a Use-After-Free condition, which will likely crash the JVM.

Fix Recommendation:
Move the acquire() logic out of the iterator's constructor and into RDBTable. RDBTable should acquire the dbRef lock before instantiating the native iterator, and pass the acquired lock into the RDBStoreAbstractIterator constructor.

Additionally it identified unprotected uses in the OM. I don't think these are as critical since OM volume failure should be fatal, but it would be good to file a follow-up Jira to check these.

@ptlrs do you have any comments on this change?

ptlrs

Thanks for the PR @priyeshkaratha.

Iteration can possibly be a long running operation. In case of volume failure, we don't want to keep the RocksDb forced open just because somebody is iterating on that Db. There could be other operations running on the Db such as compaction and running those on a faulty disk can result in the Db entering an inconsistent state.
This requires a cooperation mechanism amongst the clients to check if there is a pending request to close the Db and if so to close the iterator early. A new PR is needed to address this.

Some of my earlier concerns around the refcounting have already been addressed by other comments.

While this change looks mostly good to me, I do believe that the refcounting should happen at the handle level instead of the operation level.

Refcounting at operation level can have subtle bugs and is at best only valid during the lifetime of operation. It may result in partial completion of methods which could result in inconsistent/partial state.

The usage of ReferenceCountedDB over RawDB could possibly be the generalized solution.

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java

sumitagrawl

@priyeshkaratha This will fix only TypedTable and race condition as mentioned by Ethan, we can add fix to ManagedRocksDBIterator and this should be use for every access,

Check if isClosed() and isValid()
Also implicitly managed by leak detection if reference acquired is not released.

Like below, so that when ever its accessed. There will be changes to all access and some code impact.

public class ManagedRocksIterator extends ManagedObject<RocksIterator> {
  AutoCloseable acquire;

  public ManagedRocksIterator(RocksIterator original, IRoskDBClosable dbClosable) {
    super(original);
    acquire =  dbClosable.acquire();
  }

  @Override
  public RocksIterator get() throws RocksDBException {
    if (dbClosable.isClosed()) {
      throw new RocksDBException("Rocksdb is closed");
    }
    if (super.get().isValid()) {
       throw new RocksDBException("iterator is not valid");
    }
    return super.get();
  }

  @Override
  public void close() {
    super.close();
    acquire.close();
  }

  public static ManagedRocksIterator managed(RocksIterator iterator, IRoskDBClosable dbClosable) {
    return new ManagedRocksIterator(iterator, dbClosable);
  }
}

priyeshkaratha · 2026-03-18T07:57:37Z

Thanks @errose28 @sumitagrawl @ptlrs for the reviews.

@errose28
I have refactored the code to move the acquire() logic out of the RDBStoreAbstractIterator. The reference is now safely acquired before the native iterator is passed in, preventing the Use-After-Free crash. I will also file a follow-up Jira to audit the unprotected uses in OM as you suggested - https://issues.apache.org/jira/browse/HDDS-14856

@sumitagrawl
I completely agree that managing this at the ManagedRocksIterator level is a more robust and generalized approach. The recent changes align with this by moving the reference counting (dbRef) out of the RDBStoreAbstractIterator and handling it at the handle/managed level, ensuring it applies to all accesses and not just TypedTable

@ptlrs I have addressed your comments in individual sections.

ptlrs

LGTM. Thanks for the changes @priyeshkaratha.

errose28

Thanks for the updates @priyeshkaratha LGTM.

...ainer-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractRDBStore.java

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java

priyeshkaratha · 2026-03-19T10:29:04Z

Thanks @ChenSammi for the review. I have updated the patch.

ChenSammi

Thanks @priyeshkaratha , the last patch LGTM.

priyeshkaratha added 2 commits March 10, 2026 16:08

HDDS-14800. Guard RocksDB iterator against closed DB during volume fa…

e8330e7

…ilure

adding unit testcases

e348b90

priyeshkaratha added 2 commits March 11, 2026 10:46

fix ci issues

be4f3b0

adding unit testcases

765500e

priyeshkaratha force-pushed the HDDS-14800 branch from 22bc2d7 to 765500e Compare March 11, 2026 08:34

fixing findbug error is ci

321a771

priyeshkaratha marked this pull request as ready for review March 11, 2026 10:39

yandrey321 reviewed Mar 11, 2026

View reviewed changes

...ainer-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractRDBStore.java Show resolved Hide resolved

yandrey321 reviewed Mar 11, 2026

View reviewed changes

...ainer-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractRDBStore.java Outdated Show resolved Hide resolved

yandrey321 reviewed Mar 11, 2026

View reviewed changes

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java Show resolved Hide resolved

making close synchronized and store to null after close

6697649

priyeshkaratha requested a review from yandrey321 March 11, 2026 16:19

yandrey321 reviewed Mar 11, 2026

View reviewed changes

...framework/src/test/java/org/apache/hadoop/hdds/utils/db/TestRDBStoreIteratorWithDBClose.java Show resolved Hide resolved

Test hasNext with 10 concurrent iterators on DB close

d9d7f12

priyeshkaratha requested a review from yandrey321 March 12, 2026 03:54

Gargi-jais11 reviewed Mar 12, 2026

View reviewed changes

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java Show resolved Hide resolved

addressing review changes

ca6f595

priyeshkaratha requested a review from Gargi-jais11 March 13, 2026 06:07

devmadhuu reviewed Mar 13, 2026

View reviewed changes

addressing review comments

aeebe7d

priyeshkaratha requested a review from devmadhuu March 16, 2026 12:25

fixing ci issues and adding more testcases.

15b7a47

devmadhuu reviewed Mar 16, 2026

View reviewed changes

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java Outdated Show resolved Hide resolved

throws exception instead of suppressing

db4e390

priyeshkaratha requested a review from devmadhuu March 17, 2026 03:54

devmadhuu approved these changes Mar 17, 2026

View reviewed changes

yandrey321 approved these changes Mar 17, 2026

View reviewed changes

Gargi-jais11 approved these changes Mar 17, 2026

View reviewed changes

errose28 reviewed Mar 18, 2026

View reviewed changes

ptlrs reviewed Mar 18, 2026

View reviewed changes

sumitagrawl reviewed Mar 18, 2026

View reviewed changes

priyeshkaratha added 2 commits March 18, 2026 12:30

addressing comments

7afdaec

adding changes for some missed reviews

048dc7f

priyeshkaratha requested review from errose28, ptlrs and sumitagrawl March 18, 2026 08:00

fixing ci issues

71a6141

ptlrs approved these changes Mar 18, 2026

View reviewed changes

errose28 approved these changes Mar 18, 2026

View reviewed changes

ChenSammi reviewed Mar 19, 2026

View reviewed changes

...ainer-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractRDBStore.java Outdated Show resolved Hide resolved

ChenSammi reviewed Mar 19, 2026

View reviewed changes

...p-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStoreAbstractIterator.java Show resolved Hide resolved

addressing review comments

0c3e906

priyeshkaratha requested a review from ChenSammi March 19, 2026 10:28

ChenSammi approved these changes Mar 19, 2026

View reviewed changes

adoroszlai marked this pull request as draft March 19, 2026 16:51

Conversation

priyeshkaratha commented Mar 11, 2026

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

ChenSammi commented Mar 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Gargi-jais11 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

priyeshkaratha commented Mar 16, 2026

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

yandrey321 left a comment

Choose a reason for hiding this comment

Uh oh!

Gargi-jais11 left a comment

Choose a reason for hiding this comment

Uh oh!

errose28 left a comment

Choose a reason for hiding this comment

Uh oh!

ptlrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

priyeshkaratha commented Mar 18, 2026

Uh oh!

ptlrs left a comment

Choose a reason for hiding this comment

Uh oh!

errose28 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

priyeshkaratha commented Mar 19, 2026

Uh oh!

ChenSammi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Gargi-jais11 commented Mar 13, 2026 •

edited

Loading