Fix unix socket driver listener lifecycle races#2156
Draft
petermm wants to merge 4 commits intoatomvm:mainfrom
Draft
Fix unix socket driver listener lifecycle races#2156petermm wants to merge 4 commits intoatomvm:mainfrom
petermm wants to merge 4 commits intoatomvm:mainfrom
Conversation
101457a to
350a449
Compare
93f634e to
299c561
Compare
https://ampcode.com/threads/T-019cb8b8-9e4c-7316-9566-c7e3f5f2b6db Fix a use-after-free race condition in the generic_unix socket driver's close handler, detected by Valgrind during CI gen_tcp tests. The close handler in socket_consume_mailbox used a two-phase locking pattern: it acquired the glb->listeners lock to NULL-out the socket_data listener pointers, released it, then called sys_unregister_listener (which re-acquires the lock) to remove the listener from the linked list. Between the unlock and re-lock, the event loop thread could also unlink the same listener node via process_listener_handler after the callback returned NULL. The subsequent list_remove in sys_unregister_listener then operated on stale prev/next pointers, corrupting the list or writing to freed memory. The fix makes the pointer detach and list unlink atomic under a single lock hold by introducing sys_unregister_listener_nolock — a variant that assumes the caller already holds the glb->listeners write lock. The close handler now NULLs the pointers, unlinks the listeners, and releases the lock before freeing the memory. This pattern is specific to generic_unix; ESP32 and RP2 use a single global listener for the socket driver subsystem and are not affected. Signed-off-by: Peter M <petermm@gmail.com>
299c561 to
e7b8f0c
Compare
pguyot
reviewed
Mar 14, 2026
That commit fixed the close-time double-unlink/use-after-free race in
generic_unix listener teardown. This change addresses a separate race in
listener registration, where a listener could become visible to the
event loop before socket_data published the corresponding pointer. Both
fixes are needed; this patch complements the earlier teardown fix rather
than replacing it.
Fix a race in the generic_unix socket driver where newly created
listeners were registered in the global listener list before the
corresponding socket_data->{active,passive}_listener pointer was
published.
If the event loop processed the listener in that window, the callback
could consume, free, or replace the listener before the socket driver
stored the pointer. The later assignment then left socket_data
pointing at stale listener memory, which could surface as random hangs
or corruption in gen_tcp tests, including timeouts waiting for the
server helper process to start.
Publish the listener pointer before calling sys_register_listener
in all affected paths:
active UDP receive listener setup
active TCP receive listener setup
passive recv/recvfrom listener setup
accept listener setup
This complements the earlier close-path fix by removing another
generic_unix listener lifecycle race.
Signed-off-by: Peter M <petermm@gmail.com>
e7b8f0c to
bf13b72
Compare
Configure newly accepted generic_unix TCP sockets as nonblocking before publishing them to the socket driver machinery. If fcntl fails, close the accepted fd and reply with an error so callers never observe a partially initialized connection. Signed-off-by: Peter M <petermm@gmail.com>
If the owning process terminates between the accept() call and globalcontext_get_process_lock(), ctx will be NULL. The code immediately dereferences ctx->platform_data without checking, causing a segfault. Add a NULL check consistent with other callbacks (e.g. recv_callback), closing the accepted fd if needed and freeing the listener before returning. Amp-Thread-ID: https://ampcode.com/threads/T-019ced69-2d93-772b-b884-c90e4cbaefe5 Co-authored-by: Amp <amp@ampcode.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://ampcode.com/threads/T-019cb8b8-9e4c-7316-9566-c7e3f5f2b6db
Claims to fix #2155
1
Fix a use-after-free race condition in the generic_unix socket driver's close handler, detected by Valgrind during CI gen_tcp tests.
The close handler in socket_consume_mailbox used a two-phase locking pattern: it acquired the glb->listeners lock to NULL-out the socket_data listener pointers, released it, then called sys_unregister_listener (which re-acquires the lock) to remove the listener from the linked list. Between the unlock and re-lock, the event loop thread could also unlink the same listener node via process_listener_handler after the callback returned NULL. The subsequent list_remove in sys_unregister_listener then operated on stale prev/next pointers, corrupting the list or writing to freed memory.
The fix makes the pointer detach and list unlink atomic under a single lock hold by introducing sys_unregister_listener_nolock, a variant that assumes the caller already holds the glb->listeners write lock. The close handler now NULLs the pointers, unlinks the listeners, and releases the lock before freeing the memory.
2
Follow-up analysis also found a separate listener publication race in the same generic_unix socket driver path. Several active and passive listener setup paths registered a listener in the global listener list before publishing the corresponding socket_data->{active,passive}_listener pointer. If the event loop processed that listener in the small window before the pointer assignment, the callback could consume, free, or replace the listener first, and the later assignment would leave socket_data pointing at stale listener memory. That race could surface as random hangs or corruption in CI gen_tcp tests.
Together, these fixes make listener publication, teardown, and unlinking consistent with the event loop's locking model on generic_unix.
This pattern is specific to generic_unix; ESP32 and RP2 use a single global listener for the socket driver subsystem and are not affected.
These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later