Skip to content

fix(chainbase): defensive checks and resource leak fixes#10

Open
barbatos2011 wants to merge 3 commits intodevelopfrom
fix/defensive-code-safety
Open

fix(chainbase): defensive checks and resource leak fixes#10
barbatos2011 wants to merge 3 commits intodevelopfrom
fix/defensive-code-safety

Conversation

@barbatos2011
Copy link
Copy Markdown
Owner

@barbatos2011 barbatos2011 commented Apr 1, 2026

What

Replace assert statements with unconditional runtime checks on consensus-critical paths, fix division-by-zero risks, improve thread safety in Chainbase, and fix resource leaks across the entire database close() chain.

Why

Assert statements are disabled by default in production JVM (no -ea flag). On consensus-critical paths, this creates three failure modes depending on JVM configuration:

  • Without -ea (production default): assertions are silently skipped, code continues with invalid state (e.g., totalEnergyWeight = 0 causes (double) limit / 0LInfinity(long) Infinity = Long.MAX_VALUE, giving an account effectively unlimited energy)
  • With -ea: AssertionError (extends Error, not Exception) is thrown, bypassing all catch (Exception) handlers in the transaction processing chain, crashing the block processing thread
  • Mixed network: nodes with different -ea flags produce different execution results for the same block → consensus split

Replacing with if-throw using RuntimeException subclasses ensures: (1) the check always executes regardless of JVM flags, (2) the exception is caught by existing error handling, failing the transaction without crashing the node, (3) these conditions cannot be triggered by valid transactions on a correctly functioning network — they guard against data corruption or independent bugs in other subsystems.

Resource leaks in close() methods follow the same root cause across 4 classes: writeOptions.close() and dbSource.closeDB() in a single try block, where the first failure prevents the second from executing. The SnapshotManager.checkV2() leak is a separate issue: close() is not called at all when recover() throws.

Changes

Commit 1: Assert → if-throw replacement

Replace 5 assert statements with explicit if-throw checks:

Location Condition Exception
RepositoryImpl.increase() now < lastTime (time backwards) IllegalArgumentException
RepositoryImpl.calculateGlobalEnergyLimit() totalEnergyWeight <= 0 IllegalStateException
EnergyProcessor.calculateGlobalEnergyLimit() same, in else branch symmetric with allowNewReward() early return IllegalStateException
ResourceProcessor.increase() now < lastTime (same as RepositoryImpl) IllegalArgumentException
MerklePath.encode() authenticationPath.size() != index.size() ZksnarkException

Add MerklePathEncodeTest and DefensiveChecksTest covering all 5 exception paths.

Commit 2: Defensive numeric and structural fixes

  • TransactionTrace: add if (newSize == 0) guard before newArea / newSize division in resetAccountUsage() and resetAccountUsageV2() to prevent ArithmeticException. The zero condition can occur when the caller account has no prior energy consumption history (raw window size = 0)
  • ResilienceService: replace all new Random() with ThreadLocalRandom.current() to avoid per-call object allocation in a scheduled task; remove Random field from WeightedRandom inner class. SecureRandom is unnecessary here — random peer disconnection only needs statistical uniformity, not cryptographic unpredictability
  • DbConvert: merge two parallel List<byte[]> (keys/values) into a single List<Map.Entry<byte[], byte[]>>, eliminating index-mismatch risk at the type level. batchInsert() is a private method only called within convertLevelToRocks()

Add TransactionTraceTest cases for newSize == 0 on both V1 and V2 paths.

Commit 3: Thread safety and resource leak fixes

  • Chainbase.head: declare as volatile for cross-thread visibility. Read methods (getUnchecked, prefixQuery, etc.) are not synchronized, so without volatile reader threads may see stale references under JMM
  • Chainbase.prefixQuery(): snapshot head into a local variable at method entry, pass it to both prefixQueryRoot() and prefixQuerySnapshot(). Previously, these two sub-methods read head independently — if setHead() is called between the two reads, the results are from different snapshots. volatile alone does not fix this; the local snapshot guarantees single-call consistency
  • TronDatabase.close(): split writeOptions.close() and dbSource.closeDB() into separate try-catch blocks so one failure does not skip the other
  • LevelDB.close() / RocksDB.close(): same split try-catch pattern (discovered by tracing the full close() chain — these had the identical bug)
  • CheckPointV2Store.close(): close own writeOptions in try-catch, then call super.close() instead of dbSource.closeDB() to ensure the full parent cleanup chain executes
  • SnapshotManager.checkV2(): wrap getCheckpointDB() in try-with-resources so close() runs even if recover() throws

Add ResourceCloseTest (mock writeOptions.close() to throw, verify dbSource.closeDB() is still called) and ChainbaseConcurrentTest (4 threads, 100 iterations of concurrent setHead + prefixQuery).

Scope

  • Does NOT change consensus rules or transaction validation logic
  • The assert replacements guard conditions that cannot be triggered by valid transactions on a correctly functioning mainnet — no hard fork required
  • Does NOT modify any public API or protobuf definitions

Test

  • All existing framework tests pass (2332 tests, 8 pre-existing net module flaky failures unrelated to this PR)
  • Added 4 new test classes, 12 new test methods:
    • MerklePathEncodeTest: size mismatch → ZksnarkException; empty lists → no exception
    • DefensiveChecksTest: energy weight zero throws/returns-zero based on allowNewReward(); time-backwards throws; RepositoryImpl zero weight throws
    • TransactionTraceTest (2 new methods): resetAccountUsage V1/V2 handle newSize == 0
    • ResourceCloseTest: exception-safe close chain verification with Mockito
    • ChainbaseConcurrentTest: concurrent read/write safety
  • Checkstyle passes

@barbatos2011 barbatos2011 changed the title fix(actuator,chainbase,net,plugins): replace assert with explicit checks and defensive fixes fix(chainbase): defensive checks and resource leak fixes Apr 1, 2026
@barbatos2011
Copy link
Copy Markdown
Owner Author

PR Review: fix/defensive-code-safety

Overall direction is good. Here are the issues found:


1. Chainbase volatile fix is incomplete

prefixQuery() now uses a local snapshot of head, but other methods that read head (e.g. get(), getUnchecked()) do not apply the same treatment. If those methods are also accessed concurrently, this fix only covers one entry point. Recommend auditing all paths that read head.

2. TransactionTrace.resetAccountUsageV2() — windowSize reset when newSize==0 needs verification

The V1 path calls setNewWindowSize(ENERGY, 0L) when newSize == 0, while the V2 path calls setNewWindowSizeV2(ENERGY, 0L). The V1 windowSize is not reset in the V2 path. Please confirm that V2 mode never uses the V1 windowSize, otherwise stale data may remain.

3. CheckPointV2Store.close() — subclass and parent each have their own writeOptions field

The subclass calls writeOptions.close() then super.close() which closes the parent's writeOptions. This is technically correct since Java fields are not polymorphic, but two identically-named fields are confusing. Consider adding a comment explaining why two separate writeOptions instances need to be closed.

4. DbConvertAbstractMap.SimpleEntry overhead

Merging two parallel lists into List<Map.Entry<byte[], byte[]>> eliminates key/value desync risk — good direction. However, AbstractMap.SimpleEntry adds an extra object allocation per iteration. For large database conversions this may have a performance impact. Consider using a simple inner record/class instead.

5. ResilienceService.WeightedRandomtotalWeight may be 0

int randomNum = ThreadLocalRandom.current().nextInt(totalWeight);

If weights is empty or all weights are 0, totalWeight == 0 and nextInt(0) throws IllegalArgumentException. The original code had the same issue, but since this PR focuses on defensive fixes, it's worth adding a guard here too.

6. Test issues

  • ChainbaseConcurrentTest: 100 iterations is too few to reliably trigger concurrency bugs; chainbase.reset() / chainbase.close() are not in a finally block so resources leak if the assertion fails; consider adding @Rule Timeout to prevent hangs.
  • ResourceCloseTest: Heavy use of reflection to replace fields — any field rename will break the test. Fragile.
  • DefensiveChecksTest: Multiple tests reuse the same hex address string; consider extracting it as a constant.

7. Commit granularity

Commit e7a95dd ("thread safety and resource leak fixes") bundles multiple unrelated change types (volatile, resource close, try-with-resources). Consider splitting into finer-grained commits.


🤖 Generated with Claude Code

@barbatos2011
Copy link
Copy Markdown
Owner Author

Updated Review (after dad269ac0)

Most of the previous feedback has been addressed. Here's the updated status:

# Issue Status
1 Chainbase volatile fix incomplete Addressed — comment explains get()/getUnchecked() only read head once via head(), so volatile alone suffices.
2 TransactionTrace V2 windowSize reset Not addressed — see below.
3 CheckPointV2Store dual writeOptions confusion Addressed — Javadoc now clearly explains the two fields.
4 DbConvert AbstractMap.SimpleEntry overhead Not addressed — minor, acceptable as-is.
5 ResilienceService totalWeight==0 Addressed — guard added with IllegalStateException.
6 ChainbaseConcurrentTest quality Addressed — iterations bumped to 500, finally block added, @Rule Timeout added.
7 Commit granularity Not addressed — acceptable since the fix commit is additive.

Remaining concern

TransactionTrace.resetAccountUsageV2() newSize==0 guard — when newSize == 0, the V2 path resets windowSizeV2 via setNewWindowSizeV2(ENERGY, 0L) but does not reset the V1 windowSize. The V1 path does the opposite (setNewWindowSize). If V2 mode truly never reads V1 windowSize this is fine, but please confirm or add a comment explaining why only V2 needs resetting.

Other than that, LGTM.

🤖 Generated with Claude Code

@barbatos2011
Copy link
Copy Markdown
Owner Author

Thanks for the thorough re-review.

Regarding the remaining concern on resetAccountUsageV2() — this is actually safe. setNewWindowSizeV2 internally delegates to setNewWindowSize:

// AccountCapsule.java:1457
public void setNewWindowSizeV2(ResourceCode resourceCode, long newWindowSize) {
    this.setNewWindowSize(resourceCode, newWindowSize);  // ← resets V1 windowSize
    if (!this.getWindowOptimized(resourceCode)) {
        ...
    }
}

So the V2 path does reset the V1 windowSize — setNewWindowSizeV2 is a superset of setNewWindowSize, not an alternative. Both V1 and V2 paths end up writing to the same underlying energyWindowSize protobuf field.

@barbatos2011
Copy link
Copy Markdown
Owner Author

Verified — setNewWindowSizeV2 delegates to setNewWindowSize at AccountCapsule.java:1458, so V1 windowSize is correctly reset in the V2 path. No remaining concerns. LGTM 👍

🤖 Generated with Claude Code

@barbatos2011 barbatos2011 force-pushed the fix/defensive-code-safety branch from dad269a to c378136 Compare April 1, 2026 20:32
@barbatos2011
Copy link
Copy Markdown
Owner Author

Suggestion: split this PR by risk level

The changes in this PR are all "defensive improvements", but they vary significantly in risk and impact. Grouping them together makes review harder and increases revert blast radius.

Change Risk Impact
assert→if-throw Low Behavior-equivalent (assert is disabled in production anyway)
ThreadLocalRandom Low Nearly no functional change
DbConvert parallel list merge Low Only affects offline tooling
Resource close ordering Medium Affects node shutdown flow
TransactionTrace divide-by-zero guard High Changes core transaction processing control flow with new early returns
Chainbase volatile + local snapshot High Concurrency semantic change affecting all prefixQuery calls

Recommend splitting into 3 PRs:

  1. assert→if-throw + ThreadLocalRandom — mechanical replacements, low risk, can be merged quickly
  2. Resource close safety + DbConvert refactor — resource management improvements, medium risk, independently testable
  3. TransactionTrace divide-by-zero guard + Chainbase thread safety — core logic changes, needs the most careful review

Reasons:

  • If the Chainbase concurrency fix introduces a regression, reverting it would also roll back unrelated assert replacements
  • The TransactionTrace early return changes actual business logic — different in nature from "defensive checks" and easy to overlook when mixed together
  • Low-risk changes should not be blocked by high-risk ones

🤖 Generated with Claude Code

@barbatos2011
Copy link
Copy Markdown
Owner Author

Thanks for the suggestion. I see the point about revert blast radius, but I'd push back on the split for this case:

1. These changes share the same audit origin and review context.

All items come from the same security audit report and address the same category: "defensive code that doesn't actually defend in production." Splitting into 3 PRs means 3 rounds of CI, 3 review cycles, and 3 sets of merge conflicts to manage — for a total of ~170 lines of non-test changes.

2. The "high risk" items are lower risk than they appear.

  • TransactionTrace divide-by-zero guard: the early return sets energyUsage = 0 and windowSize = 0, which is the mathematically correct result when the window size is zero (no consumption window → no usage). It doesn't change the result for any case where newSize > 0. The condition is unlikely to occur on mainnet (requires an account with no prior energy consumption history), and if it does occur today, the node crashes with ArithmeticException — the fix is strictly better.

  • Chainbase volatile: volatile on a reference field is a standard JMM fix with near-zero overhead on x86. The prefixQuery local snapshot is a 1-line variable assignment that eliminates a read-read inconsistency. Both are minimal, well-understood patterns.

3. The current 3-commit structure already provides granular revert.

Each commit is independently revertable via git revert <hash>:

  • Commit 1 (assert→if-throw) can be reverted without touching Commit 2 or 3
  • Commit 3 (Chainbase + resource leaks) can be reverted independently

This gives the same revert granularity as separate PRs without the overhead.

That said, if the team prefers separate PRs as a policy, I'm happy to split. Just want to make sure the tradeoff is explicit.

Barbatos added 3 commits April 2, 2026 04:44
Replace 5 assert statements with unconditional runtime checks on
consensus-critical paths. assert is disabled by default in production
JVM (no -ea flag), silently skipping invariant checks. With -ea enabled,
AssertionError bypasses catch(Exception) handlers, crashing the node.

- RepositoryImpl.increase(): time-backwards check
- RepositoryImpl.calculateGlobalEnergyLimit(): zero energy weight check
- EnergyProcessor.calculateGlobalEnergyLimit(): same, symmetric with
  allowNewReward() early return
- ResourceProcessor.increase(): time-backwards check
- MerklePath.encode(): authentication path / index size mismatch
- TransactionTrace: add zero-check guard before newArea/newSize division
  in resetAccountUsage() and resetAccountUsageV2() to prevent
  ArithmeticException when window size is zero
- ResilienceService: replace all new Random() with
  ThreadLocalRandom.current() to avoid unnecessary object allocation;
  remove Random field from WeightedRandom inner class
- DbConvert: merge parallel List<byte[]> (keys/values) into single
  List<Map.Entry<byte[], byte[]>> to eliminate index-mismatch risk
- Chainbase.head: declare as volatile for cross-thread visibility
- Chainbase.prefixQuery(): snapshot head into local variable at entry
  and pass to prefixQueryRoot()/prefixQuerySnapshot() to guarantee
  consistency within a single call
- TronDatabase.close(): split writeOptions.close() and
  dbSource.closeDB() into separate try-catch blocks so one failure
  does not skip the other
- CheckPointV2Store.close(): close own writeOptions in try-catch then
  call super.close() for full parent cleanup chain
- LevelDB.close() / RocksDB.close(): same split try-catch pattern
- SnapshotManager.checkV2(): wrap getCheckpointDB() in
  try-with-resources so close() runs even if recover() throws
@barbatos2011 barbatos2011 force-pushed the fix/defensive-code-safety branch from c378136 to 4e171df Compare April 1, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant