Deterministic test proving AppRuntime destructor deadlock#145
Closed
bghgary wants to merge 3 commits intoBabylonJS:mainfrom
Closed
Deterministic test proving AppRuntime destructor deadlock#145bghgary wants to merge 3 commits intoBabylonJS:mainfrom
bghgary wants to merge 3 commits intoBabylonJS:mainfrom
Conversation
WorkQueue was a separate class with its own thread, dispatcher, and cancel source. AppRuntime captured 'this' in lambdas passed to WorkQueue, creating split-lifetime issues where the worker thread could access partially-destroyed AppRuntime members during shutdown. This change inlines WorkQueue's functionality directly into AppRuntime: - Thread, dispatcher, cancel source, and env are now AppRuntime members - Destruction order is deterministic (no cross-object lifetime issues) - Cancellation is dispatched as a work item instead of signaled externally, eliminating the race between cancel and blocking_tick's condition_variable - WorkQueue.h and WorkQueue.cpp are removed - AppRuntime_JSI.cpp updated to use AppRuntime::Dispatch instead of WorkQueue::Append Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests that destroying AppRuntime does not deadlock when the worker thread is inside blocking_tick's condition_variable::wait. Uses a timeout to detect deadlock deterministically. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Uses arcana testing hooks to sleep while holding the queue mutex right before condition_variable::wait(). This guarantees the worker is between the condition check and wait() when the destructor fires cancel+notify. The old destructor's notify_all() fires without the mutex, so the signal is lost and the worker deadlocks in wait(). The test detects this via a 5-second timeout. This branch intentionally has the OLD broken destructor to demonstrate the failure. The fix branch (fix/merge-workqueue-into-appruntime) uses Append(cancel) which coordinates through push()/mutex and passes this test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Author
|
Superseded — was based on the merged WorkQueue branch, not main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Deterministic test proving the AppRuntime destructor has a deadlock race condition.
The Bug
The destructor cancels from the main thread via cancel() + notify_all(). The notify fires without holding the queue mutex, so if the worker thread hasn't entered condition_variable::wait() yet, the signal is lost and join() hangs forever.
How This Test Works
Uses an arcana testing hook (ARCANA_TESTING_HOOKS) that sleeps while holding the queue mutex, right before wait(). This guarantees the worker is between the condition check and wait() when the destructor fires.
Expected Result
This PR should FAIL CI -- it intentionally has the old broken destructor to demonstrate the deadlock. The test will timeout after 5 seconds on iteration 0.
Dependencies