Re-send webhook on peer disconnect with orphaned HTLCs#8
Open
amackillop wants to merge 2 commits intolsp-0.2.0from
Open
Re-send webhook on peer disconnect with orphaned HTLCs#8amackillop wants to merge 2 commits intolsp-0.2.0from
amackillop wants to merge 2 commits intolsp-0.2.0from
Conversation
When a serverless SDK peer reconnects from a webhook but disconnects again before channel_reestablish completes, the intercepted HTLC gets stuck: process_pending_htlcs only iterates connected peers, so a disconnected peer's HTLCs just sit there until the 45s expiry timer kills them. Fix this by firing another SendWebhook from peer_disconnected when orphaned HTLCs remain in the store. The peer wakes up, reconnects, and peer_connected / process_pending_htlcs get another shot at forwarding. If it disconnects again, the cycle repeats until either the forward succeeds or handle_expired_htlcs cleans up at 45s.
The TOCTOU fix in eecf53e made process_pending_htlcs skip whenever calculate_htlc_actions_for_peer returned new_channel_needed_msat, assuming htlc_intercepted or peer_connected had already emitted OpenChannel. That assumption is wrong for the disconnected-peer path: htlc_intercepted only stores the HTLC and sends a webhook, peer_connected defers because channels aren't usable yet, and nobody ever emits OpenChannel. The timer was the only place that could, but the skip prevented it. The HTLC sat there until expiry. Replace the unconditional skip with a pending_channel_opens set that tracks which peers actually have an OpenChannel in flight. execute_htlc_actions inserts, channel_ready removes. The timer only skips if the peer is in the set. Considered making the OpenChannel handler in ldk-node idempotent instead, but create_channel has real side effects (funding tx) and deduplicating at that layer would be more invasive for the same result.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a serverless SDK peer reconnects from a webhook but disconnects again before channel_reestablish completes, the intercepted HTLC gets stuck: process_pending_htlcs only iterates connected peers, so a disconnected peer's HTLCs just sit there until the 45s expiry timer kills them.
Fix this by firing another SendWebhook from peer_disconnected when orphaned HTLCs remain in the store. The peer wakes up, reconnects, and peer_connected / process_pending_htlcs get another shot at forwarding. If it disconnects again, the cycle repeats until either the forward succeeds or handle_expired_htlcs cleans up at 45s.