HDDS-14730. Update Recon container sync to use container IDs by jasonosullivan34 · Pull Request #9842 · apache/ozone

jasonosullivan34 · 2026-02-27T11:06:56Z

What changes were proposed in this pull request?

Change to use Container IDs to reduce the payload size for the RPCs between Recon and SCM for container sync

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14730

How was this patch tested?

Unit tests
Manual testing

@devmadhuu

…r-ids

devmadhuu

Thanks @jasonosullivan34 for the patch. Few comments in line with code. Pls check.

Also please add in PR description , how the patch was tested. Also better write following tests:

A unit test for ContainerStateMap.getContainerIDs(state, start, count) verifying pagination and state filtering
A unit or integration test for syncWithSCMContainerInfo() covering the "container missing from Recon, add it" path, and the "container already present, skip it" path

...e/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/StorageContainerServiceProvider.java

...rc/main/java/org/apache/hadoop/ozone/recon/spi/impl/StorageContainerServiceProviderImpl.java

...pache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java

...work/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java

.../apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java

...econ/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerManagerFacade.java

.../apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java

devmadhuu · 2026-02-27T16:33:35Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManager.java

+
+  /**
+   * Returns container IDs under certain conditions.
+   * Search container IDs from start ID(exclusive),


ContainerStateMap.getContainerIDs() uses tailMap(start) which is inclusive. Pls correct the javadoc.

I copied this javadoc from ContainerManager.getContainers, will I correct that one too?

Yes, should be corrected

devmadhuu

Thanks @jasonosullivan34 for improving the patch, however, still few points need to handle. Pls check.

devmadhuu · 2026-03-04T15:13:45Z

...econ/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerManagerFacade.java

  public boolean syncWithSCMContainerInfo()
-      throws IOException {
+      throws Exception {
    if (isSyncDataFromSCMRunning.compareAndSet(false, true)) {


Suggested change

if (isSyncDataFromSCMRunning.compareAndSet(false, true)) {

if (isSyncDataFromSCMRunning.compareAndSet(false, true)) {

try {

return containerSyncHelper.syncWithSCMContainerInfo();

} finally {

isSyncDataFromSCMRunning.compareAndSet(true, false);

}

}

LOG.debug("SCM DB sync is already running.");

return false;

devmadhuu · 2026-03-04T15:23:35Z

...e/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerSyncHelper.java

+    return true;
+  }
+
+  private long getContainerCountPerCall(long totalContainerCount) {


Earlier, CONTAINER_METADATA_SIZE was defined as 1MB to estimate a ContainerInfo object. A ContainerID proto is ~8–12 bytes. With an IPC max of 128MB, the old code limited batches to floor(128MB / 1MB) = 128 containers per call. The new code does the same — 128 IDs per call — when it could safely fetch floor(128MB / 12 bytes) ≈ 5.5 million IDs per call. This means the change may actually increase the number of RPCs instead of reducing them. So I think, we should test with large set of container Ids specially when a cluster will have 4-5 millions CLOSED containers and record the SCM latency and impact. Because that was the whole objective behind why this JIRA was raised. And based on impact, we should think of innovative way to handle impact on SCM as well as rpc message length also should not exceed default 128 MB.

I propose limiting the number of container IDs we fetch back in a single message to 500K. This would equate to 16MB which is well within the RPC message size limit and should be safe enough for SCM to handle.

I also want to introduce a config where we can reduce the number of container ids fetched if we need to fine tune due to memory constraints

devmadhuu · 2026-03-04T15:26:23Z

...econ/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerManagerFacade.java


  public boolean syncWithSCMContainerInfo()
-      throws IOException {
+      throws Exception {


The helper already catches all Exception internally and returns false — so the throws Exception is misleading. The facade should declare throws IOException (or nothing if the helper swallows everything).

...e/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerSyncHelper.java

...con/src/test/java/org/apache/hadoop/ozone/recon/scm/TestReconStorageContainerSyncHelper.java

jasonosullivan34 added 2 commits February 27, 2026 10:51

HDDS-14730. Using container ids for recon sync

1ed200e

Merge remote-tracking branch 'origin/master' into HDDS-14730-containe…

909cf24

…r-ids

devmadhuu self-requested a review February 27, 2026 11:43

HDDS-14730. Using container ids for recon sync

155d765

jasonosullivan34 closed this Feb 27, 2026

jasonosullivan34 reopened this Feb 27, 2026

devmadhuu reviewed Feb 27, 2026

View reviewed changes

jasonosullivan34 added 3 commits February 27, 2026 16:59

HDDS-14730. Using container ids for recon sync

f8c943f

HDDS-14730. Using container ids for recon sync

47191c5

HDDS-14730. Using container ids for recon sync

f825196

jasonosullivan34 requested a review from devmadhuu March 3, 2026 09:03

devmadhuu reviewed Mar 4, 2026

View reviewed changes

jasonosullivan34 added 5 commits March 4, 2026 17:24

HDDS-14730. Using container ids for recon sync

0d213a4

Merge branch 'master' into HDDS-14730-container-ids

d619e93

HDDS-14730. Update batch size for Recon to SCM sync using container ids

18fcbea

HDDS-14730. Updating recon container sync rpc message defaults

7d28268

HDDS-14730. Updating recon container sync rpc message defaults

8575141

jasonosullivan34 requested a review from devmadhuu March 19, 2026 14:41

-    if (isSyncDataFromSCMRunning.compareAndSet(false, true)) {
+    if (isSyncDataFromSCMRunning.compareAndSet(false, true)) {
+        try {
+            return containerSyncHelper.syncWithSCMContainerInfo();
+        } finally {
+            isSyncDataFromSCMRunning.compareAndSet(true, false);
+        }
+    }
+    LOG.debug("SCM DB sync is already running.");
+    return false;

Conversation

jasonosullivan34 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devmadhuu Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

jasonosullivan34 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

devmadhuu Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

devmadhuu Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

devmadhuu Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

jasonosullivan34 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

devmadhuu Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jasonosullivan34 commented Feb 27, 2026 •

edited

Loading