fix(platform-wallet): fix spv client deadlocking himself when trying to stop#3742
fix(platform-wallet): fix spv client deadlocking himself when trying to stop#3742ZocoLini wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthrough
ChangesSPV Runtime Lock Guard Lifetime
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
9833000 to
c9dceb1
Compare
c9dceb1 to
8b8ebe7
Compare
8b8ebe7 to
da12a71
Compare
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The deadlock fix in SpvRuntime::run() is directionally correct, but the new clone-and-drop pattern introduces a real lifecycle race on stop-then-restart. SpvRuntime can now lose track of a newly started SPV client while that client keeps running, which makes the runtime report SpvNotRunning even though the background sync loop is still alive.
🔴 1 blocking
Review details
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-platform-wallet/src/spv/runtime.rs`:
- [BLOCKING] lines 138-151: Exiting `run()` can clear a newer client installed after `stop()`
`run()` now clones the current `DashSpvClient`, drops the read lock, awaits `client.run()`, and then unconditionally executes `self.client.write().await.take()`. That cleanup is no longer tied to the generation that originally started the run loop. `stop()` removes the current client from `self.client` before the old `client.run()` future has necessarily returned, so a caller can immediately `start()` again and install a fresh client in the slot. When the old run loop finally exits, its unconditional `take()` removes that new client, leaving the restarted SPV sync running only through the cloned `Arc` state inside `dash-spv` while `SpvRuntime::is_started()` becomes false and later `stop()` calls cannot reach it. The upstream `DashSpvClient::clone()` is a shallow clone over shared `Arc` fields, and the same crate already documents this stale-cleanup pattern as a real bug in `identity_sync` and `shielded_sync`, so `SpvRuntime` needs the same generation/ownership check before clearing the slot.
Reviewed commit: da12a71a
GitHub rejected inline review comments for this update, so the findings are posted body-only.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## v3.1-dev #3742 +/- ##
============================================
- Coverage 87.16% 87.16% -0.01%
============================================
Files 2607 2607
Lines 319420 319420
============================================
- Hits 278413 278412 -1
- Misses 41007 41008 +1
🚀 New features to boost your workflow:
|
|
✅ DashSDKFFI.xcframework built for this PR.
SwiftPM (host the zip at a stable URL, then use): .binaryTarget(
name: "DashSDKFFI",
url: "https://your.cdn.example/DashSDKFFI.xcframework.zip",
checksum: "1f02384a222aebf9b8068258d3c3980e592fa0608e6351b500938a3ca9c00859"
)Xcode manual integration:
|
|
@thepastaclaw please describe the race condition with an example flow. |
|
Example flow I had in mind:
At step 5, the slot can now contain client B, not client A, so the cleanup for the old run removes the newly-started client. After that, So the cleanup needs to be ownership/generation-aware: only clear the slot if it still contains the same client/generation that this |
The run method in the spa client wrapper was holding a read() lock over the inner spv client while the client was running, and stop tried to take a write() lock, what I do in this PR is clonning the Client, Client fields are Arc so we preserve reference to the same memory space.
Other way to solve the issue would be to take a read() lock on stop, and stop the client, then run() would continue and take a write() lock to replace de Some with a None, maybe it sounds like Chinese without having the code in front but yeah, two ways to solve it and I went with the one its the easiest to read and understand. Anyway, I am trying to refactor the SpvClient usage in platform wallet but first I need my integrations tests to get in so I can make sure I don't make any regression
Closes #3741
Checklist:
For repository code-owners and collaborators only
Summary by CodeRabbit