fix(chainbase): defensive checks and resource leak fixes by barbatos2011 · Pull Request #6633 · tronprotocol/java-tron

barbatos2011 · 2026-04-01T20:41:35Z

What

Replace assert statements with unconditional runtime checks on consensus-critical paths, fix division-by-zero risks, improve thread safety in Chainbase, and fix resource leaks across the entire database close() chain.

Why

Assert statements are disabled by default in production JVM (no -ea flag). On consensus-critical paths, this creates three failure modes depending on JVM configuration:

Without -ea (production default): assertions are silently skipped, code continues with invalid state (e.g., totalEnergyWeight = 0 causes (double) limit / 0L → Infinity → (long) Infinity = Long.MAX_VALUE, giving an account effectively unlimited energy)
With -ea: AssertionError (extends Error, not Exception) is thrown, bypassing all catch (Exception) handlers in the transaction processing chain, crashing the block processing thread
Mixed network: nodes with different -ea flags produce different execution results for the same block → consensus split

Replacing with if-throw using RuntimeException subclasses ensures: (1) the check always executes regardless of JVM flags, (2) the exception is caught by existing error handling, failing the transaction without crashing the node, (3) these conditions cannot be triggered by valid transactions on a correctly functioning network — they guard against data corruption or independent bugs in other subsystems.

Resource leaks in close() methods follow the same root cause across 4 classes: writeOptions.close() and dbSource.closeDB() in a single try block, where the first failure prevents the second from executing. The SnapshotManager.checkV2() leak is a separate issue: close() is not called at all when recover() throws.

Changes

Commit 1: Assert → if-throw replacement

Replace 5 assert statements with explicit if-throw checks:

Location	Condition	Exception
`RepositoryImpl.increase()`	`now < lastTime` (time backwards)	`IllegalArgumentException`
`RepositoryImpl.calculateGlobalEnergyLimit()`	`totalEnergyWeight <= 0`	`IllegalStateException`
`EnergyProcessor.calculateGlobalEnergyLimit()`	same, in else branch symmetric with `allowNewReward()` early return	`IllegalStateException`
`ResourceProcessor.increase()`	`now < lastTime` (same as RepositoryImpl)	`IllegalArgumentException`
`MerklePath.encode()`	`authenticationPath.size() != index.size()`	`ZksnarkException`

Add MerklePathEncodeTest and DefensiveChecksTest covering all 5 exception paths.

Commit 2: Defensive numeric and structural fixes

TransactionTrace: add if (newSize == 0) guard before newArea / newSize division in resetAccountUsage() and resetAccountUsageV2() to prevent ArithmeticException. The zero condition can occur when the caller account has no prior energy consumption history (raw window size = 0)
ResilienceService: replace all new Random() with ThreadLocalRandom.current() to avoid per-call object allocation in a scheduled task; remove Random field from WeightedRandom inner class; add totalWeight <= 0 guard before nextInt(). SecureRandom is unnecessary here — random peer disconnection only needs statistical uniformity, not cryptographic unpredictability
DbConvert: merge two parallel List<byte[]> (keys/values) into a single List<Map.Entry<byte[], byte[]>>, eliminating index-mismatch risk at the type level. batchInsert() is a private method only called within convertLevelToRocks()

Add TransactionTraceTest cases for newSize == 0 on both V1 and V2 paths.

Commit 3: Thread safety and resource leak fixes

Chainbase.head: declare as volatile for cross-thread visibility. Read methods (getUnchecked, prefixQuery, etc.) are not synchronized, so without volatile reader threads may see stale references under JMM
Chainbase.prefixQuery(): snapshot head into a local variable at method entry, pass it to both prefixQueryRoot() and prefixQuerySnapshot(). Previously, these two sub-methods read head independently — if setHead() is called between the two reads, the results are from different snapshots. volatile alone does not fix this; the local snapshot guarantees single-call consistency. Other read methods (get, getUnchecked, has) only read head once via head(), so volatile alone is sufficient for them
TronDatabase.close(): split writeOptions.close() and dbSource.closeDB() into separate try-catch blocks so one failure does not skip the other
LevelDB.close() / RocksDB.close(): same split try-catch pattern (discovered by tracing the full close() chain — these had the identical bug)
CheckPointV2Store.close(): close own writeOptions in try-catch, then call super.close() instead of dbSource.closeDB() to ensure the full parent cleanup chain executes. Note: subclass and parent each declare their own writeOptions field (Java fields are not polymorphic), both must be closed separately
SnapshotManager.checkV2(): wrap getCheckpointDB() in try-with-resources so close() runs even if recover() throws

Add ResourceCloseTest (mock writeOptions.close() to throw, verify dbSource.closeDB() is still called) and ChainbaseConcurrentTest (4 threads, 500 iterations of concurrent setHead + prefixQuery).

Scope

Does NOT change consensus rules or transaction validation logic
The assert replacements guard conditions that cannot be triggered by valid transactions on a correctly functioning mainnet — no hard fork required
Does NOT modify any public API or protobuf definitions

Test

All existing framework tests pass (2332 tests, 8 pre-existing net module flaky failures unrelated to this PR)
Added 4 new test classes, 12 new test methods:
- MerklePathEncodeTest: size mismatch → ZksnarkException; empty lists → no exception
- DefensiveChecksTest: energy weight zero throws/returns-zero based on allowNewReward(); time-backwards throws; RepositoryImpl zero weight throws
- TransactionTraceTest (2 new methods): resetAccountUsage V1/V2 handle newSize == 0
- ResourceCloseTest: exception-safe close chain verification with Mockito
- ChainbaseConcurrentTest: concurrent read/write safety
Checkstyle passes

Replace 5 assert statements with unconditional runtime checks on consensus-critical paths. assert is disabled by default in production JVM (no -ea flag), silently skipping invariant checks. With -ea enabled, AssertionError bypasses catch(Exception) handlers, crashing the node. - RepositoryImpl.increase(): time-backwards check - RepositoryImpl.calculateGlobalEnergyLimit(): zero energy weight check - EnergyProcessor.calculateGlobalEnergyLimit(): same, symmetric with allowNewReward() early return - ResourceProcessor.increase(): time-backwards check - MerklePath.encode(): authentication path / index size mismatch

- TransactionTrace: add zero-check guard before newArea/newSize division in resetAccountUsage() and resetAccountUsageV2() to prevent ArithmeticException when window size is zero - ResilienceService: replace all new Random() with ThreadLocalRandom.current() to avoid unnecessary object allocation; remove Random field from WeightedRandom inner class - DbConvert: merge parallel List<byte[]> (keys/values) into single List<Map.Entry<byte[], byte[]>> to eliminate index-mismatch risk

- Chainbase.head: declare as volatile for cross-thread visibility - Chainbase.prefixQuery(): snapshot head into local variable at entry and pass to prefixQueryRoot()/prefixQuerySnapshot() to guarantee consistency within a single call - TronDatabase.close(): split writeOptions.close() and dbSource.closeDB() into separate try-catch blocks so one failure does not skip the other - CheckPointV2Store.close(): close own writeOptions in try-catch then call super.close() for full parent cleanup chain - LevelDB.close() / RocksDB.close(): same split try-catch pattern - SnapshotManager.checkV2(): wrap getCheckpointDB() in try-with-resources so close() runs even if recover() throws

3for · 2026-04-02T00:42:27Z

chainbase/src/main/java/org/tron/core/db2/core/Chainbase.java

+    Snapshot snapshot = localHead;
+    if (!snapshot.equals(localHead.getRoot())) {
      Map<WrappedByteArray, WrappedByteArray> all = new HashMap<>();
      ((SnapshotImpl) snapshot).collect(all, key);
      all.forEach((k, v) -> result.put(k, v.getBytes()));


Though it appears to bind the query to the caller-selected view via Snapshot snapshot = localHead;, but the subsequent call to ((SnapshotImpl) snapshot).collect(all, key) does not actually collect from localHead. Since SnapshotImpl.collect(...) starts from getRoot().getNext() and walks forward until the latest tip, so the effective scan range is the entire overlay chain after root, not the caller’s resolved snapshot.

Good catch. You're right that collect() always walks from getRoot().getNext() to the end of the chain, regardless of which SnapshotImpl instance it's called on. So even with the local snapshot, if head advances during prefixQuery, collect() will scan into the newly appended snapshots.

This can be fixed by adding a boundary parameter to collect():

// SnapshotImpl.java synchronized void collect(Map<WrappedByteArray, WrappedByteArray> all, byte[] prefix, Snapshot boundary) { Snapshot next = getRoot().getNext(); while (next != null) { Streams.stream(((SnapshotImpl) next).db) .filter(e -> Bytes.indexOf( Objects.requireNonNull(e.getKey().getBytes()), prefix) == 0) .forEach(e -> all.put(WrappedByteArray.of(e.getKey().getBytes()), WrappedByteArray.of(e.getValue().getBytes()))); if (next == boundary) { break; } next = next.getNext(); } }

Then prefixQuerySnapshot passes localHead as the boundary:

((SnapshotImpl) localHead).collect(all, key, localHead);

That said, two notes:

This is a pre-existing issue, not introduced by this PR. The original code read head directly and collect() walked to the end in exactly the same way. The local snapshot fix in this PR still provides value — it ensures prefixQueryRoot and prefixQuerySnapshot use the same getRoot(), which was the original read-read inconsistency.

Other callers have the same issue. Chainbase.getAll() (line 179) and getNext() (line 325) also call collect() without a boundary. If this is worth fixing, all call sites should be addressed together.

Recommend tracking this as a separate follow-up issue rather than expanding the scope of this PR.

3for · 2026-04-02T00:55:03Z

chainbase/src/main/java/org/tron/core/db2/core/Chainbase.java

  public Map<WrappedByteArray, byte[]> prefixQuery(byte[] key) {
-    Map<WrappedByteArray, byte[]> result = prefixQueryRoot(key);
-    Map<WrappedByteArray, byte[]>  snapshot = prefixQuerySnapshot(key);
+    Snapshot localHead = head;


prefixQuery() previously depended on cursor-resolved snapshot selection through head(), which supports HEAD, SOLIDITY, and PBFT. The current implementation now bypasses that logic by reading the raw head field directly. As a result, prefixQuery() can diverge from get()/getUnchecked()/has() under non-HEAD cursors.

Valid issue. The original code used head() in prefixQuerySnapshot which routes through cursor logic (HEAD/SOLIDITY/PBFT), but this PR replaced it with the raw head field, bypassing cursor resolution entirely. Under SOLIDITY or PBFT cursors, prefixQuery would now return unconfirmed head-level data that get()/getUnchecked()/has() would not see.

Fix: change Snapshot localHead = head to Snapshot localHead = head(). This preserves both the cursor semantics and the single-read consistency that the local snapshot was intended to provide.

3for · 2026-04-02T01:12:34Z

framework/src/test/java/org/tron/core/db2/ChainbaseConcurrentTest.java

+  }
+
+  @Test
+  public void testConcurrentSetHeadAndPrefixQuery() throws Exception {


It needs content assertions, not just liveness assertions, to validate the advertised fix for prefixQuery() consistency.

Agreed. The current test only verifies liveness (no exceptions, non-null results), which is insufficient to validate snapshot consistency. Will add content assertions — specifically, verifying that under SOLIDITY cursor, prefixQuery does not return head-only keys.

Note: there's also a pre-existing issue where SnapshotImpl.collect() always walks from getRoot().getNext() to the latest tip regardless of the caller's snapshot boundary. This means even with the cursor fix, prefixQuery overlay results may include data beyond the resolved snapshot. Will document this as a known limitation and track separately.

3for · 2026-04-02T01:19:52Z

framework/src/test/java/org/tron/core/db2/ChainbaseConcurrentTest.java

+
+      latch.countDown();
+      executor.shutdown();
+      executor.awaitTermination(30, TimeUnit.SECONDS);


Add check for executor.awaitTermination(...)'s boolean result, right now if it returns false, that means the 30-second timeout expired and worker threads may still be running, yet the test will still continue into chainbase.reset() and chainbase.close() in the finally block.

Good point. If awaitTermination returns false, worker threads may still be running when the finally block executes reset()/close(), causing a race. Will add shutdownNow() on timeout:

executor.shutdown(); if (!executor.awaitTermination(30, TimeUnit.SECONDS)) { executor.shutdownNow(); }

The @Rule Timeout.seconds(60) provides a backstop against hangs, but this ensures clean thread termination before resource cleanup.

bladehan1 · 2026-04-02T01:54:32Z

actuator/src/main/java/org/tron/core/vm/repository/RepositoryImpl.java


    if (lastTime != now) {
-      assert now > lastTime;
+      if (now < lastTime) {


It needs to be further confirmed that this branch would not have occurred historically.

Barbatos added 3 commits April 2, 2026 04:44

barbatos2011 force-pushed the fix/defensive-code-safety branch from c378136 to 4e171df Compare April 1, 2026 20:44

fdinesh497-collab approved these changes Apr 1, 2026

View reviewed changes

3for reviewed Apr 2, 2026

View reviewed changes

bladehan1 reviewed Apr 2, 2026

View reviewed changes

kuny0707 closed this Apr 2, 2026

kuny0707 reopened this Apr 2, 2026

kuny0707 requested review from Sunny6889 and kuny0707 as code owners April 2, 2026 09:35

kuny0707 closed this Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chainbase): defensive checks and resource leak fixes#6633

fix(chainbase): defensive checks and resource leak fixes#6633
barbatos2011 wants to merge 3 commits intotronprotocol:developfrom
barbatos2011:fix/defensive-code-safety

barbatos2011 commented Apr 1, 2026

Uh oh!

3for Apr 2, 2026

Uh oh!

barbatos2011 Apr 2, 2026

Uh oh!

3for Apr 2, 2026

Uh oh!

barbatos2011 Apr 2, 2026

Uh oh!

3for Apr 2, 2026

Uh oh!

barbatos2011 Apr 2, 2026

Uh oh!

3for Apr 2, 2026

Uh oh!

barbatos2011 Apr 2, 2026

Uh oh!

bladehan1 Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

barbatos2011 commented Apr 1, 2026

What

Why

Changes

Commit 1: Assert → if-throw replacement

Commit 2: Defensive numeric and structural fixes

Commit 3: Thread safety and resource leak fixes

Scope

Test

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants