Guard Scheduler deferred lambdas against delegate teardown (#56680)#56680
Guard Scheduler deferred lambdas against delegate teardown (#56680)#56680fkgozali wants to merge 2 commits into
Conversation
|
@fkgozali has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103727974. |
Summary: Introduces a new common React Native feature flag, `enableSchedulerDelegateInvalidation`, that gates a defensive guard around `Scheduler::uiManagerDidDispatchCommand` and `uiManagerDidFinishTransaction`. Those call sites enqueue a lambda via `runtimeScheduler_->scheduleRenderingUpdate` that captures the raw `SchedulerDelegate` pointer by value. With `RuntimeScheduler_Modern` (bridgeless) the lambda runs asynchronously, so if the `SchedulerDelegate` is destroyed between enqueue and execution the lambda dereferences dangling memory and the process crashes (`EXC_BAD_ACCESS` / `KERN_INVALID_ADDRESS`). The follow-up commit adds the actual guard wired to this flag. Defaults to `false` so behavior is unchanged on landing. Apps opt in by overriding the flag in their own `ReactNativeFeatureFlagsDefaults` subclass. Includes the regenerated feature-flag boilerplate (Kotlin/Java/Cxx accessors, defaults, providers, JS spec, native module) emitted by `yarn featureflags --update`. Changelog: [Internal] Reviewed By: shwanton Differential Revision: D103757271
…56680) Summary: Scheduler::uiManagerDidFinishTransaction and Scheduler::uiManagerDidDispatchCommand queue lambdas via runtimeScheduler_->scheduleRenderingUpdate that capture the raw delegate_ pointer by value. With RuntimeScheduler_Modern (the bridgeless implementation), the lambda runs asynchronously, so if the SchedulerDelegate is destroyed between enqueue and execution (surface teardown, Scheduler destruction, or setDelegate flip), the lambda dereferences dangling memory → EXC_BAD_ACCESS / KERN_INVALID_ADDRESS. Add a per-delegate-identity invalidation token (`shared_ptr<atomic<bool>>`) owned by Scheduler and captured by-value into the deferred lambdas. The destructor and setDelegate flip the flag to true; lambdas check it (acquire ordering) before dereferencing the captured raw delegate. The shared_ptr keeps the atomic alive for any outstanding lambdas. Public API is unchanged (private member only), so C++ API snapshots are unaffected. Changelog: [General][Fixed] - Prevent Scheduler use-after-free crash when surfaces tear down with pending rendering updates Reviewed By: mdvacca Differential Revision: D103727974
be32607 to
1441d7b
Compare
|
This pull request has been merged in aadbe96. |
|
Hi @fkgozali, Kamil from Software Mansion here. We’re seeing what looks like the same issue in one of our client's project. I really like this solution, but I was wondering about one edge case. Have you considered this issue? // Thread A: queued rendering update
if (guardEnabled && *invalidated) { // reads false
return;
}
// Thread B: Scheduler::setDelegate(...)
*delegateInvalidated_ = true;
delegate_ = newDelegate;
// old delegate is freed
// Thread A resumes
delegate->schedulerShouldRenderTransactions(...); // possible use-after-free?I like your solution because of how small and non-invasive it is. I was also wondering whether you considered more ownership/synchronization-heavy alternatives, like moving the delegate to |
|
@delekta Thanks for the info! Note that this fix is a "workaround" because it's hiding the actual problem: "why is the queue not flushing things during teardown?". I expect some more iteration on this, but for now, at least the app should not hard crash, though it might show some more unpredictable behavior after teardown. On moving the delegate to Do you happen to have solid repro for this btw? If so would be helpful to see if this featureflag/fix helps in your scenario. |
|
@fkgozali Thanks, that makes sense. Unfortunately we do not have a repro. That has been the biggest issue for us with this bug. Without a repro, the best signal we have is crash volume after enabling the fix. I can try your solution in our scenario and watch whether the crash rate changes. The closest mitigation we tried before was flushing the |
I see, I think there are a few scenarios overall that we need to identify:
I suspect we need a combo of approaches, not just purely flushing, or dropping them like what I did in this PR |
Summary:
Scheduler::uiManagerDidFinishTransaction and Scheduler::uiManagerDidDispatchCommand queue lambdas via runtimeScheduler_->scheduleRenderingUpdate that capture the raw delegate_ pointer by value. With RuntimeScheduler_Modern (the bridgeless implementation), the lambda runs asynchronously, so if the SchedulerDelegate is destroyed between enqueue and execution (surface teardown, Scheduler destruction, or setDelegate flip), the lambda dereferences dangling memory → EXC_BAD_ACCESS / KERN_INVALID_ADDRESS.
Add a per-delegate-identity invalidation token (
shared_ptr<atomic<bool>>) owned by Scheduler and captured by-value into the deferred lambdas. The destructor and setDelegate flip the flag to true; lambdas check it (acquire ordering) before dereferencing the captured raw delegate.The shared_ptr keeps the atomic alive for any outstanding lambdas. Public API is unchanged (private member only), so C++ API snapshots are unaffected.
Changelog:
[General][Fixed] - Prevent Scheduler use-after-free crash when surfaces tear down with pending rendering updates
Reviewed By: mdvacca
Differential Revision: D103727974