HDDS-14859. Ignore transient errors while validating RocksDb on volumes. by ptlrs · Pull Request #9947 · apache/ozone

ptlrs · 2026-03-18T20:11:29Z

What changes were proposed in this pull request?

In the volume scanner, we open the RocksDb that is present on each volume.
There could be transient errors when opening this RocksDb in readonly mode.

The volume scanner should therefore try to open the RocksDb instance, a second time after a gap, to reduce any false positives when declaring disk failures.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14859

How was this patch tested?

https://github.com/ptlrs/ozone/actions/runs/23264628727

errose28 · 2026-03-18T20:57:24Z

Thanks for looking into this @ptlrs. I think we may want to use a secondary instance instead of a read-only instance for this check. It looks like it will meet the same goal of reading CURRENT and MANIFEST files, but will only fail if the DB is truly in bad health. We will need to provide it a directory to write its own log files, but we can use the volume's specific tmp directory for this, which is already used for other temporary files used during the disk check.

ptlrs · 2026-03-18T23:59:39Z

Thanks for taking a look @errose28. Using secondary instance had crossed my mind while writing this PR but that page was the reason I didn't make the change.

In fact I saw this comment in a different discussion which says:

Read-only instances has only the state of the db when the read-only instance is opened. It's not meant to be used when the db is also opened by a normal DB instance. The normal db instance can cause file deletions unaware to the read-only instance. Future reads issued to the read-only instance may hit IOError due to trying to read from non-existing files.

So based on the documentation and the comment, to me it appears that unless you actually perform any reads in RO mode, there won't be a problem.

We don't perform any reads during the volume check but RocksDb does read the metadata from the footer of all SST files as well the MANIFEST files and the WAL when it is opened in RO mode.

If that is where the problem is then, not sure if using a secondary instance would solve our problem.

Secondary instance appears to be the same as RO instance with the extra capability of bringing the instance upto speed with the RW instance using a manual command invocation.
This is nice to have but useful only if you actually read the data.

I don't mind changing to secondary instance, as at worst we would get the same behavior but it would be good to brainstorm what could be the differences.

errose28 · 2026-03-19T14:42:07Z

@ptlrs the main documentation is unfortunately vague here, but there's also this excerpt from the FAQ:

Q: Can I write to RocksDB using multiple processes?

A: No. However, it can be opened using Secondary DB. If no write goes to the database, it can be opened in read-only mode from multiple processes.

Q: Does RocksDB support multi-process read access?

A: Yes, you can read it using secondary database using DB::OpenAsSecondary(). RocksDB can also support multi-process read only process without writing the database. This can be done by opening the database with DB::OpenForReadOnly() call.

In our case we are only using one process with multiple handles. However, since writes will be going to the DB as we are checking the volume, this seems to indicate that secondary instance is still what we want to use here.

yandrey321 · 2026-03-19T15:56:19Z

...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java

  private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;

+  @Config(key = DISK_CHECK_RETRY_GAP_KEY,
+      defaultValue = "1m",


should it be at least 15m?

No, we don't want a long time between two checks as the goal here is to ignore transient errors while opening RocksDb.

If the gap between two consecutive attempts at opening RocksDb is too long, it is no longer about transient errors.

Secondly, we can't have the time period between opening the RocksDb twice be 15 minutes as it will fail the disk check timeout of 10 minutes.

In case of errors when opening the RocksDb, we ideally want the thread to sleep for a very short time, try to open the RocksDb a second time and if we still fail, mark it as an actual failure to open RocksDb.

should we also increase 10 mins timeout then? in case of 3rd retry we'll spend 2 out of 10 mins in sleep + time for actual processing...

yandrey321 · 2026-03-19T15:56:41Z

...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java

-    } catch (Exception e) {
-      if (Thread.currentThread().isInterrupted()) {
-        throw new InterruptedException("Check of database for volume " + this + " interrupted.");
+    final int maxAttempts = 2;


should it be configurable?

My thought process for not making this configurable was to avoid problematic configurations where the total attempts could exceed the disk check timeout.

For example if maxAttempts is 10 and you sleep for 1 minute in each attempt, firstly this is no longer about transient errors and the total time of 10 minutes will make the method flaky and fail the disk check timeout of 10 minutes.

I can change it if we think we may still want that control in production.

HDDS-14859. Ignore transient errors while validating RocksDb on volumes.

9f50fba

yandrey321 reviewed Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-14859. Ignore transient errors while validating RocksDb on volumes.#9947

HDDS-14859. Ignore transient errors while validating RocksDb on volumes.#9947
ptlrs wants to merge 1 commit intoapache:masterfrom
ptlrs:HDDS-14859-Ignore-transient-errors-while-validating-RocksDb-on-volumes

ptlrs commented Mar 18, 2026

Uh oh!

errose28 commented Mar 18, 2026

Uh oh!

ptlrs commented Mar 18, 2026

Uh oh!

errose28 commented Mar 19, 2026

Uh oh!

yandrey321 Mar 19, 2026

Uh oh!

ptlrs Mar 19, 2026

Uh oh!

yandrey321 Mar 19, 2026

Uh oh!

yandrey321 Mar 19, 2026

Uh oh!

ptlrs Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ptlrs commented Mar 18, 2026

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

errose28 commented Mar 18, 2026

Uh oh!

ptlrs commented Mar 18, 2026

Uh oh!

errose28 commented Mar 19, 2026

Uh oh!

yandrey321 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ptlrs Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

yandrey321 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

yandrey321 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ptlrs Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants