fix(chainbase): defensive checks and resource leak fixes#10
fix(chainbase): defensive checks and resource leak fixes#10barbatos2011 wants to merge 3 commits intodevelopfrom
Conversation
PR Review: fix/defensive-code-safetyOverall direction is good. Here are the issues found: 1.
|
Updated Review (after
|
| # | Issue | Status |
|---|---|---|
| 1 | Chainbase volatile fix incomplete |
Addressed — comment explains get()/getUnchecked() only read head once via head(), so volatile alone suffices. |
| 2 | TransactionTrace V2 windowSize reset |
Not addressed — see below. |
| 3 | CheckPointV2Store dual writeOptions confusion |
Addressed — Javadoc now clearly explains the two fields. |
| 4 | DbConvert AbstractMap.SimpleEntry overhead |
Not addressed — minor, acceptable as-is. |
| 5 | ResilienceService totalWeight==0 |
Addressed — guard added with IllegalStateException. |
| 6 | ChainbaseConcurrentTest quality |
Addressed — iterations bumped to 500, finally block added, @Rule Timeout added. |
| 7 | Commit granularity | Not addressed — acceptable since the fix commit is additive. |
Remaining concern
TransactionTrace.resetAccountUsageV2() newSize==0 guard — when newSize == 0, the V2 path resets windowSizeV2 via setNewWindowSizeV2(ENERGY, 0L) but does not reset the V1 windowSize. The V1 path does the opposite (setNewWindowSize). If V2 mode truly never reads V1 windowSize this is fine, but please confirm or add a comment explaining why only V2 needs resetting.
Other than that, LGTM.
🤖 Generated with Claude Code
|
Thanks for the thorough re-review. Regarding the remaining concern on // AccountCapsule.java:1457
public void setNewWindowSizeV2(ResourceCode resourceCode, long newWindowSize) {
this.setNewWindowSize(resourceCode, newWindowSize); // ← resets V1 windowSize
if (!this.getWindowOptimized(resourceCode)) {
...
}
}So the V2 path does reset the V1 windowSize — |
|
Verified — 🤖 Generated with Claude Code |
dad269a to
c378136
Compare
Suggestion: split this PR by risk levelThe changes in this PR are all "defensive improvements", but they vary significantly in risk and impact. Grouping them together makes review harder and increases revert blast radius.
Recommend splitting into 3 PRs:
Reasons:
🤖 Generated with Claude Code |
|
Thanks for the suggestion. I see the point about revert blast radius, but I'd push back on the split for this case: 1. These changes share the same audit origin and review context. All items come from the same security audit report and address the same category: "defensive code that doesn't actually defend in production." Splitting into 3 PRs means 3 rounds of CI, 3 review cycles, and 3 sets of merge conflicts to manage — for a total of ~170 lines of non-test changes. 2. The "high risk" items are lower risk than they appear.
3. The current 3-commit structure already provides granular revert. Each commit is independently revertable via
This gives the same revert granularity as separate PRs without the overhead. That said, if the team prefers separate PRs as a policy, I'm happy to split. Just want to make sure the tradeoff is explicit. |
Replace 5 assert statements with unconditional runtime checks on consensus-critical paths. assert is disabled by default in production JVM (no -ea flag), silently skipping invariant checks. With -ea enabled, AssertionError bypasses catch(Exception) handlers, crashing the node. - RepositoryImpl.increase(): time-backwards check - RepositoryImpl.calculateGlobalEnergyLimit(): zero energy weight check - EnergyProcessor.calculateGlobalEnergyLimit(): same, symmetric with allowNewReward() early return - ResourceProcessor.increase(): time-backwards check - MerklePath.encode(): authentication path / index size mismatch
- TransactionTrace: add zero-check guard before newArea/newSize division in resetAccountUsage() and resetAccountUsageV2() to prevent ArithmeticException when window size is zero - ResilienceService: replace all new Random() with ThreadLocalRandom.current() to avoid unnecessary object allocation; remove Random field from WeightedRandom inner class - DbConvert: merge parallel List<byte[]> (keys/values) into single List<Map.Entry<byte[], byte[]>> to eliminate index-mismatch risk
- Chainbase.head: declare as volatile for cross-thread visibility - Chainbase.prefixQuery(): snapshot head into local variable at entry and pass to prefixQueryRoot()/prefixQuerySnapshot() to guarantee consistency within a single call - TronDatabase.close(): split writeOptions.close() and dbSource.closeDB() into separate try-catch blocks so one failure does not skip the other - CheckPointV2Store.close(): close own writeOptions in try-catch then call super.close() for full parent cleanup chain - LevelDB.close() / RocksDB.close(): same split try-catch pattern - SnapshotManager.checkV2(): wrap getCheckpointDB() in try-with-resources so close() runs even if recover() throws
c378136 to
4e171df
Compare
What
Replace
assertstatements with unconditional runtime checks on consensus-critical paths, fix division-by-zero risks, improve thread safety inChainbase, and fix resource leaks across the entire databaseclose()chain.Why
Assert statements are disabled by default in production JVM (no
-eaflag). On consensus-critical paths, this creates three failure modes depending on JVM configuration:-ea(production default): assertions are silently skipped, code continues with invalid state (e.g.,totalEnergyWeight = 0causes(double) limit / 0L→Infinity→(long) Infinity=Long.MAX_VALUE, giving an account effectively unlimited energy)-ea:AssertionError(extendsError, notException) is thrown, bypassing allcatch (Exception)handlers in the transaction processing chain, crashing the block processing thread-eaflags produce different execution results for the same block → consensus splitReplacing with
if-throwusingRuntimeExceptionsubclasses ensures: (1) the check always executes regardless of JVM flags, (2) the exception is caught by existing error handling, failing the transaction without crashing the node, (3) these conditions cannot be triggered by valid transactions on a correctly functioning network — they guard against data corruption or independent bugs in other subsystems.Resource leaks in
close()methods follow the same root cause across 4 classes:writeOptions.close()anddbSource.closeDB()in a single try block, where the first failure prevents the second from executing. TheSnapshotManager.checkV2()leak is a separate issue:close()is not called at all whenrecover()throws.Changes
Commit 1: Assert → if-throw replacement
Replace 5
assertstatements with explicitif-throwchecks:RepositoryImpl.increase()now < lastTime(time backwards)IllegalArgumentExceptionRepositoryImpl.calculateGlobalEnergyLimit()totalEnergyWeight <= 0IllegalStateExceptionEnergyProcessor.calculateGlobalEnergyLimit()allowNewReward()early returnIllegalStateExceptionResourceProcessor.increase()now < lastTime(same as RepositoryImpl)IllegalArgumentExceptionMerklePath.encode()authenticationPath.size() != index.size()ZksnarkExceptionAdd
MerklePathEncodeTestandDefensiveChecksTestcovering all 5 exception paths.Commit 2: Defensive numeric and structural fixes
if (newSize == 0)guard beforenewArea / newSizedivision inresetAccountUsage()andresetAccountUsageV2()to preventArithmeticException. The zero condition can occur when the caller account has no prior energy consumption history (raw window size = 0)new Random()withThreadLocalRandom.current()to avoid per-call object allocation in a scheduled task; removeRandomfield fromWeightedRandominner class.SecureRandomis unnecessary here — random peer disconnection only needs statistical uniformity, not cryptographic unpredictabilityList<byte[]>(keys/values) into a singleList<Map.Entry<byte[], byte[]>>, eliminating index-mismatch risk at the type level.batchInsert()is a private method only called withinconvertLevelToRocks()Add
TransactionTraceTestcases fornewSize == 0on both V1 and V2 paths.Commit 3: Thread safety and resource leak fixes
volatilefor cross-thread visibility. Read methods (getUnchecked,prefixQuery, etc.) are not synchronized, so withoutvolatilereader threads may see stale references under JMMheadinto a local variable at method entry, pass it to bothprefixQueryRoot()andprefixQuerySnapshot(). Previously, these two sub-methods readheadindependently — ifsetHead()is called between the two reads, the results are from different snapshots.volatilealone does not fix this; the local snapshot guarantees single-call consistencywriteOptions.close()anddbSource.closeDB()into separate try-catch blocks so one failure does not skip the otherclose()chain — these had the identical bug)writeOptionsin try-catch, then callsuper.close()instead ofdbSource.closeDB()to ensure the full parent cleanup chain executesgetCheckpointDB()in try-with-resources soclose()runs even ifrecover()throwsAdd
ResourceCloseTest(mockwriteOptions.close()to throw, verifydbSource.closeDB()is still called) andChainbaseConcurrentTest(4 threads, 100 iterations of concurrentsetHead+prefixQuery).Scope
Test
MerklePathEncodeTest: size mismatch →ZksnarkException; empty lists → no exceptionDefensiveChecksTest: energy weight zero throws/returns-zero based onallowNewReward(); time-backwards throws; RepositoryImpl zero weight throwsTransactionTraceTest(2 new methods):resetAccountUsageV1/V2 handlenewSize == 0ResourceCloseTest: exception-safe close chain verification with MockitoChainbaseConcurrentTest: concurrent read/write safety