Skip to content

fix(platform-wallet): fix spv client deadlocking himself when trying to stop#3742

Open
ZocoLini wants to merge 1 commit into
v3.1-devfrom
fix/spv-client-deadlock
Open

fix(platform-wallet): fix spv client deadlocking himself when trying to stop#3742
ZocoLini wants to merge 1 commit into
v3.1-devfrom
fix/spv-client-deadlock

Conversation

@ZocoLini
Copy link
Copy Markdown
Collaborator

@ZocoLini ZocoLini commented May 25, 2026

The run method in the spa client wrapper was holding a read() lock over the inner spv client while the client was running, and stop tried to take a write() lock, what I do in this PR is clonning the Client, Client fields are Arc so we preserve reference to the same memory space.

Other way to solve the issue would be to take a read() lock on stop, and stop the client, then run() would continue and take a write() lock to replace de Some with a None, maybe it sounds like Chinese without having the code in front but yeah, two ways to solve it and I went with the one its the easiest to read and understand. Anyway, I am trying to refactor the SpvClient usage in platform wallet but first I need my integrations tests to get in so I can make sure I don't make any regression

Closes #3741

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have added "!" to the title and described breaking changes in the corresponding section if my code contains any
  • I have made corresponding changes to the documentation if needed

For repository code-owners and collaborators only

  • I have assigned this pull request to a milestone

Summary by CodeRabbit

  • Refactor
    • Improved SPV wallet runtime client handling to enhance resource management efficiency.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

📝 Walkthrough

Walkthrough

SpvRuntime::run restructures lock management to release the RwLock read guard before awaiting the SPV client's run loop. The client is cloned from the guarded state, the guard is dropped, and the async operation proceeds unblocked. Error handling remains consistent.

Changes

SPV Runtime Lock Guard Lifetime

Layer / File(s) Summary
Lock guard release before client run await
packages/rs-platform-wallet/src/spv/runtime.rs
The client is cloned from the locked Option<SpvClient>, the client_guard reference is explicitly dropped before awaiting, and then client.run().await executes without holding the read lock. Error mapping to PlatformWalletError::SpvError(...) and SpvNotRunning handling remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • dashpay/platform#3730: Both PRs modify SpvRuntime::run to avoid holding the RwLock guard across async execution by releasing it before awaiting.
  • dashpay/platform#3729: Both PRs modify SpvRuntime::run lock mechanics, but #3729 removes cancellation-token wiring and changes the API signature while this PR focuses on guard lifetime.

Suggested reviewers

  • QuantumExplorer
  • xdustinface
  • shumkov

Poem

🐰 A guard released before the long await,
No lock held while async tasks communicate,
The client runs free in the night,
Lock patterns corrected, the code feels right! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing an SPV client deadlock during shutdown, which aligns with the code modification.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/spv-client-deadlock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added this to the v3.1.0 milestone May 25, 2026
@ZocoLini ZocoLini changed the title fix spv client deadlocking himself when sending a tx fix(platform-wallet): fix spv client deadlocking himself when sending a tx May 25, 2026
@ZocoLini ZocoLini force-pushed the fix/spv-client-deadlock branch from 9833000 to c9dceb1 Compare May 25, 2026 19:41
@ZocoLini ZocoLini changed the title fix(platform-wallet): fix spv client deadlocking himself when sending a tx fix(platform-wallet): fix spv client deadlocking himself when trying to stop by refactoring spv usage in platform wallet May 25, 2026
@ZocoLini ZocoLini force-pushed the fix/spv-client-deadlock branch from c9dceb1 to 8b8ebe7 Compare May 25, 2026 19:51
@ZocoLini ZocoLini changed the title fix(platform-wallet): fix spv client deadlocking himself when trying to stop by refactoring spv usage in platform wallet fix(platform-wallet): fix spv client deadlocking himself when trying to stop May 25, 2026
@ZocoLini ZocoLini force-pushed the fix/spv-client-deadlock branch from 8b8ebe7 to da12a71 Compare May 25, 2026 20:38
@ZocoLini ZocoLini marked this pull request as ready for review May 25, 2026 20:38
@ZocoLini ZocoLini requested a review from QuantumExplorer as a code owner May 25, 2026 20:38
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The deadlock fix in SpvRuntime::run() is directionally correct, but the new clone-and-drop pattern introduces a real lifecycle race on stop-then-restart. SpvRuntime can now lose track of a newly started SPV client while that client keeps running, which makes the runtime report SpvNotRunning even though the background sync loop is still alive.

🔴 1 blocking

Review details

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-platform-wallet/src/spv/runtime.rs`:
- [BLOCKING] lines 138-151: Exiting `run()` can clear a newer client installed after `stop()`
  `run()` now clones the current `DashSpvClient`, drops the read lock, awaits `client.run()`, and then unconditionally executes `self.client.write().await.take()`. That cleanup is no longer tied to the generation that originally started the run loop. `stop()` removes the current client from `self.client` before the old `client.run()` future has necessarily returned, so a caller can immediately `start()` again and install a fresh client in the slot. When the old run loop finally exits, its unconditional `take()` removes that new client, leaving the restarted SPV sync running only through the cloned `Arc` state inside `dash-spv` while `SpvRuntime::is_started()` becomes false and later `stop()` calls cannot reach it. The upstream `DashSpvClient::clone()` is a shallow clone over shared `Arc` fields, and the same crate already documents this stale-cleanup pattern as a real bug in `identity_sync` and `shielded_sync`, so `SpvRuntime` needs the same generation/ownership check before clearing the slot.

Reviewed commit: da12a71a

GitHub rejected inline review comments for this update, so the findings are posted body-only.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.16%. Comparing base (31e8af2) to head (da12a71).

Additional details and impacted files
@@             Coverage Diff              @@
##           v3.1-dev    #3742      +/-   ##
============================================
- Coverage     87.16%   87.16%   -0.01%     
============================================
  Files          2607     2607              
  Lines        319420   319420              
============================================
- Hits         278413   278412       -1     
- Misses        41007    41008       +1     
Components Coverage Δ
dpp 87.67% <ø> (-0.01%) ⬇️
drive 85.95% <ø> (ø)
drive-abci 89.60% <ø> (ø)
sdk ∅ <ø> (∅)
dapi-client ∅ <ø> (∅)
platform-version ∅ <ø> (∅)
platform-value 92.17% <ø> (ø)
platform-wallet ∅ <ø> (∅)
drive-proof-verifier 49.16% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

✅ DashSDKFFI.xcframework built for this PR.

SwiftPM (host the zip at a stable URL, then use):

.binaryTarget(
  name: "DashSDKFFI",
  url: "https://your.cdn.example/DashSDKFFI.xcframework.zip",
  checksum: "1f02384a222aebf9b8068258d3c3980e592fa0608e6351b500938a3ca9c00859"
)

Xcode manual integration:

  • Download 'DashSDKFFI.xcframework' artifact from the run link above.
  • Drag it into your app target (Frameworks, Libraries & Embedded Content) and set Embed & Sign.
  • If using the Swift wrapper package, point its binaryTarget to the xcframework location or add the package and place the xcframework at the expected path.

@lklimek
Copy link
Copy Markdown
Contributor

lklimek commented May 27, 2026

@thepastaclaw please describe the race condition with an example flow.

@thepastaclaw
Copy link
Copy Markdown
Collaborator

Example flow I had in mind:

  1. Task A calls SpvRuntime::run(config_a). start() installs client A in self.client.
  2. run() clones client A, drops the read guard, then awaits client_a.run().
  3. Task B calls stop(). It takes self.client (so the slot becomes None) and calls client_a.stop(). The old client_a.run() future may not have returned yet.
  4. Before that old future returns, Task C calls run(config_b) / start(config_b), which installs client B in self.client.
  5. The old client_a.run() future finally returns. Task A resumes and unconditionally runs self.client.write().await.take().

At step 5, the slot can now contain client B, not client A, so the cleanup for the old run removes the newly-started client. After that, SpvRuntime::is_started() reports false and later stop() cannot reach client B, even though client B may still be alive/running through its own shared state.

So the cleanup needs to be ownership/generation-aware: only clear the slot if it still contains the same client/generation that this run() cloned. If concurrent restart is meant to be invalid instead, then start()/run() should probably enforce that explicitly rather than allowing the stale cleanup path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SpvRuntime::stop() deadlocks against running SpvRuntime::run()

3 participants