Async client: fix silent data loss on the final frame of a server stream#312
Async client: fix silent data loss on the final frame of a server stream#312shvbsle wants to merge 1 commit into
Conversation
|
On CI Failures: The CI build failures are unrelated to the code changes. Cargo 1.81 cannot resolve the current crate ecosystem because new crate versions now require edition2024. Seems like infrastructure rot. |
You can update rust-toolchain.toml to 1.95 to fix this. |
I had a PR for this but its not as simple as updating the toolchain. Need to fix the new clippy issues that are introduced in new versions: I can cherry pick or rebase this PR after the above one is merged |
|
LGTM. At the time of implementation, it was simply designed to behave exactly like the Go version, even though the Go version might be incorrect... |
2dadc6b to
c908838
Compare
|
@tzneal @dims Added a regression test in the second commit. |
e6e7521 to
2afc15c
Compare
ClientReader::handle_msg spawns a new task per frame. For a server-streaming RPC, the final DATA frame and the subsequent FLAG_REMOTE_CLOSED frame then race: if the close-frame task grabs the req_map lock first, it removes the stream from the map, and the preceding data-frame task finds nothing and silently drops the payload. The stream consumer sees Ok(None) (EOF) without ever observing the payload the server sent. The connection read loop already awaits handle_msg per frame, so processing inline preserves per-stream wire order. It also gives the per-stream mpsc natural back-pressure in place of the unbounded per-frame spawning. handle_err gets the same treatment for consistency. Add two regression examples: - async-stream-close-order: verifies DATA before CLOSE is not lost - async-data-order: verifies multi-frame streams maintain sequence Signed-off-by: Shiv Bhosale <shvbsle@amazon.com>
Fixes #311
Process each frame inline in
ClientReader::handle_msginstead oftokio::spawn-ing per frame. The read loop inconnection.rsalready awaitshandle_msgper frame, so inline preserves per-stream wire order and restores natural back-pressure on the per-stream mpsc.handle_erradjusted the same way for consistency.Server dispatch also spawns per frame, but there the spawn is load-bearing (user handlers can take arbitrary time) and a oneshot already preserves ordering. Left alone.
Regression tests
Added two examples that exercise the race on a
multi_threadruntime:async-stream-close-order- server-streaming handler sends one payload and returns (DATA immediately followed by REMOTE_CLOSED). Repeats 2000 times in batches of 200 and asserts every iteration receives the payload before EOF. Targets data loss from the close-frame removing the stream before the data-frame task looks it up.async-data-order- server-streaming handler sends 50 sequential DATA frames per stream. Opens 100 concurrent streams and asserts every stream receives frames in sequence order. Targets reordering from spawned tasks executing out of submission order.Without the fix:
With the fix both pass cleanly across repeated runs.
Compatibility
Public API, wire format, and observable behavior for correctly-behaving consumers are unchanged.