fix(chainbase): defensive checks and resource leak fixes#6633
fix(chainbase): defensive checks and resource leak fixes#6633barbatos2011 wants to merge 3 commits intotronprotocol:developfrom
Conversation
Replace 5 assert statements with unconditional runtime checks on consensus-critical paths. assert is disabled by default in production JVM (no -ea flag), silently skipping invariant checks. With -ea enabled, AssertionError bypasses catch(Exception) handlers, crashing the node. - RepositoryImpl.increase(): time-backwards check - RepositoryImpl.calculateGlobalEnergyLimit(): zero energy weight check - EnergyProcessor.calculateGlobalEnergyLimit(): same, symmetric with allowNewReward() early return - ResourceProcessor.increase(): time-backwards check - MerklePath.encode(): authentication path / index size mismatch
- TransactionTrace: add zero-check guard before newArea/newSize division in resetAccountUsage() and resetAccountUsageV2() to prevent ArithmeticException when window size is zero - ResilienceService: replace all new Random() with ThreadLocalRandom.current() to avoid unnecessary object allocation; remove Random field from WeightedRandom inner class - DbConvert: merge parallel List<byte[]> (keys/values) into single List<Map.Entry<byte[], byte[]>> to eliminate index-mismatch risk
- Chainbase.head: declare as volatile for cross-thread visibility - Chainbase.prefixQuery(): snapshot head into local variable at entry and pass to prefixQueryRoot()/prefixQuerySnapshot() to guarantee consistency within a single call - TronDatabase.close(): split writeOptions.close() and dbSource.closeDB() into separate try-catch blocks so one failure does not skip the other - CheckPointV2Store.close(): close own writeOptions in try-catch then call super.close() for full parent cleanup chain - LevelDB.close() / RocksDB.close(): same split try-catch pattern - SnapshotManager.checkV2(): wrap getCheckpointDB() in try-with-resources so close() runs even if recover() throws
c378136 to
4e171df
Compare
| Snapshot snapshot = localHead; | ||
| if (!snapshot.equals(localHead.getRoot())) { | ||
| Map<WrappedByteArray, WrappedByteArray> all = new HashMap<>(); | ||
| ((SnapshotImpl) snapshot).collect(all, key); | ||
| all.forEach((k, v) -> result.put(k, v.getBytes())); |
There was a problem hiding this comment.
Though it appears to bind the query to the caller-selected view via Snapshot snapshot = localHead;, but the subsequent call to ((SnapshotImpl) snapshot).collect(all, key) does not actually collect from localHead. Since SnapshotImpl.collect(...) starts from getRoot().getNext() and walks forward until the latest tip, so the effective scan range is the entire overlay chain after root, not the caller’s resolved snapshot.
There was a problem hiding this comment.
Good catch. You're right that collect() always walks from getRoot().getNext() to the end of the chain, regardless of which SnapshotImpl instance it's called on. So even with the local snapshot, if head advances during prefixQuery, collect() will scan into the newly appended snapshots.
This can be fixed by adding a boundary parameter to collect():
// SnapshotImpl.java
synchronized void collect(Map<WrappedByteArray, WrappedByteArray> all,
byte[] prefix, Snapshot boundary) {
Snapshot next = getRoot().getNext();
while (next != null) {
Streams.stream(((SnapshotImpl) next).db)
.filter(e -> Bytes.indexOf(
Objects.requireNonNull(e.getKey().getBytes()), prefix) == 0)
.forEach(e -> all.put(WrappedByteArray.of(e.getKey().getBytes()),
WrappedByteArray.of(e.getValue().getBytes())));
if (next == boundary) {
break;
}
next = next.getNext();
}
}Then prefixQuerySnapshot passes localHead as the boundary:
((SnapshotImpl) localHead).collect(all, key, localHead);That said, two notes:
-
This is a pre-existing issue, not introduced by this PR. The original code read
headdirectly andcollect()walked to the end in exactly the same way. The local snapshot fix in this PR still provides value — it ensuresprefixQueryRootandprefixQuerySnapshotuse the samegetRoot(), which was the original read-read inconsistency. -
Other callers have the same issue.
Chainbase.getAll()(line 179) andgetNext()(line 325) also callcollect()without a boundary. If this is worth fixing, all call sites should be addressed together.
Recommend tracking this as a separate follow-up issue rather than expanding the scope of this PR.
| public Map<WrappedByteArray, byte[]> prefixQuery(byte[] key) { | ||
| Map<WrappedByteArray, byte[]> result = prefixQueryRoot(key); | ||
| Map<WrappedByteArray, byte[]> snapshot = prefixQuerySnapshot(key); | ||
| Snapshot localHead = head; |
There was a problem hiding this comment.
prefixQuery() previously depended on cursor-resolved snapshot selection through head(), which supports HEAD, SOLIDITY, and PBFT. The current implementation now bypasses that logic by reading the raw head field directly. As a result, prefixQuery() can diverge from get()/getUnchecked()/has() under non-HEAD cursors.
There was a problem hiding this comment.
Valid issue. The original code used head() in prefixQuerySnapshot which routes through cursor logic (HEAD/SOLIDITY/PBFT), but this PR replaced it with the raw head field, bypassing cursor resolution entirely. Under SOLIDITY or PBFT cursors, prefixQuery would now return unconfirmed head-level data that get()/getUnchecked()/has() would not see.
Fix: change Snapshot localHead = head to Snapshot localHead = head(). This preserves both the cursor semantics and the single-read consistency that the local snapshot was intended to provide.
| } | ||
|
|
||
| @Test | ||
| public void testConcurrentSetHeadAndPrefixQuery() throws Exception { |
There was a problem hiding this comment.
It needs content assertions, not just liveness assertions, to validate the advertised fix for prefixQuery() consistency.
There was a problem hiding this comment.
Agreed. The current test only verifies liveness (no exceptions, non-null results), which is insufficient to validate snapshot consistency. Will add content assertions — specifically, verifying that under SOLIDITY cursor, prefixQuery does not return head-only keys.
Note: there's also a pre-existing issue where SnapshotImpl.collect() always walks from getRoot().getNext() to the latest tip regardless of the caller's snapshot boundary. This means even with the cursor fix, prefixQuery overlay results may include data beyond the resolved snapshot. Will document this as a known limitation and track separately.
|
|
||
| latch.countDown(); | ||
| executor.shutdown(); | ||
| executor.awaitTermination(30, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Add check for executor.awaitTermination(...)'s boolean result, right now if it returns false, that means the 30-second timeout expired and worker threads may still be running, yet the test will still continue into chainbase.reset() and chainbase.close() in the finally block.
There was a problem hiding this comment.
Good point. If awaitTermination returns false, worker threads may still be running when the finally block executes reset()/close(), causing a race. Will add shutdownNow() on timeout:
executor.shutdown();
if (!executor.awaitTermination(30, TimeUnit.SECONDS)) {
executor.shutdownNow();
}The @Rule Timeout.seconds(60) provides a backstop against hangs, but this ensures clean thread termination before resource cleanup.
|
|
||
| if (lastTime != now) { | ||
| assert now > lastTime; | ||
| if (now < lastTime) { |
There was a problem hiding this comment.
It needs to be further confirmed that this branch would not have occurred historically.
What
Replace
assertstatements with unconditional runtime checks on consensus-critical paths, fix division-by-zero risks, improve thread safety inChainbase, and fix resource leaks across the entire databaseclose()chain.Why
Assert statements are disabled by default in production JVM (no
-eaflag). On consensus-critical paths, this creates three failure modes depending on JVM configuration:-ea(production default): assertions are silently skipped, code continues with invalid state (e.g.,totalEnergyWeight = 0causes(double) limit / 0L→Infinity→(long) Infinity=Long.MAX_VALUE, giving an account effectively unlimited energy)-ea:AssertionError(extendsError, notException) is thrown, bypassing allcatch (Exception)handlers in the transaction processing chain, crashing the block processing thread-eaflags produce different execution results for the same block → consensus splitReplacing with
if-throwusingRuntimeExceptionsubclasses ensures: (1) the check always executes regardless of JVM flags, (2) the exception is caught by existing error handling, failing the transaction without crashing the node, (3) these conditions cannot be triggered by valid transactions on a correctly functioning network — they guard against data corruption or independent bugs in other subsystems.Resource leaks in
close()methods follow the same root cause across 4 classes:writeOptions.close()anddbSource.closeDB()in a single try block, where the first failure prevents the second from executing. TheSnapshotManager.checkV2()leak is a separate issue:close()is not called at all whenrecover()throws.Changes
Commit 1: Assert → if-throw replacement
Replace 5
assertstatements with explicitif-throwchecks:RepositoryImpl.increase()now < lastTime(time backwards)IllegalArgumentExceptionRepositoryImpl.calculateGlobalEnergyLimit()totalEnergyWeight <= 0IllegalStateExceptionEnergyProcessor.calculateGlobalEnergyLimit()allowNewReward()early returnIllegalStateExceptionResourceProcessor.increase()now < lastTime(same as RepositoryImpl)IllegalArgumentExceptionMerklePath.encode()authenticationPath.size() != index.size()ZksnarkExceptionAdd
MerklePathEncodeTestandDefensiveChecksTestcovering all 5 exception paths.Commit 2: Defensive numeric and structural fixes
if (newSize == 0)guard beforenewArea / newSizedivision inresetAccountUsage()andresetAccountUsageV2()to preventArithmeticException. The zero condition can occur when the caller account has no prior energy consumption history (raw window size = 0)new Random()withThreadLocalRandom.current()to avoid per-call object allocation in a scheduled task; removeRandomfield fromWeightedRandominner class; addtotalWeight <= 0guard beforenextInt().SecureRandomis unnecessary here — random peer disconnection only needs statistical uniformity, not cryptographic unpredictabilityList<byte[]>(keys/values) into a singleList<Map.Entry<byte[], byte[]>>, eliminating index-mismatch risk at the type level.batchInsert()is a private method only called withinconvertLevelToRocks()Add
TransactionTraceTestcases fornewSize == 0on both V1 and V2 paths.Commit 3: Thread safety and resource leak fixes
volatilefor cross-thread visibility. Read methods (getUnchecked,prefixQuery, etc.) are not synchronized, so withoutvolatilereader threads may see stale references under JMMheadinto a local variable at method entry, pass it to bothprefixQueryRoot()andprefixQuerySnapshot(). Previously, these two sub-methods readheadindependently — ifsetHead()is called between the two reads, the results are from different snapshots.volatilealone does not fix this; the local snapshot guarantees single-call consistency. Other read methods (get,getUnchecked,has) only readheadonce viahead(), sovolatilealone is sufficient for themwriteOptions.close()anddbSource.closeDB()into separate try-catch blocks so one failure does not skip the otherclose()chain — these had the identical bug)writeOptionsin try-catch, then callsuper.close()instead ofdbSource.closeDB()to ensure the full parent cleanup chain executes. Note: subclass and parent each declare their ownwriteOptionsfield (Java fields are not polymorphic), both must be closed separatelygetCheckpointDB()in try-with-resources soclose()runs even ifrecover()throwsAdd
ResourceCloseTest(mockwriteOptions.close()to throw, verifydbSource.closeDB()is still called) andChainbaseConcurrentTest(4 threads, 500 iterations of concurrentsetHead+prefixQuery).Scope
Test
MerklePathEncodeTest: size mismatch →ZksnarkException; empty lists → no exceptionDefensiveChecksTest: energy weight zero throws/returns-zero based onallowNewReward(); time-backwards throws; RepositoryImpl zero weight throwsTransactionTraceTest(2 new methods):resetAccountUsageV1/V2 handlenewSize == 0ResourceCloseTest: exception-safe close chain verification with MockitoChainbaseConcurrentTest: concurrent read/write safety