Skip to content

Guard Scheduler deferred lambdas against delegate teardown (#56680)#56680

Closed
fkgozali wants to merge 2 commits into
facebook:mainfrom
fkgozali:export-D103727974
Closed

Guard Scheduler deferred lambdas against delegate teardown (#56680)#56680
fkgozali wants to merge 2 commits into
facebook:mainfrom
fkgozali:export-D103727974

Conversation

@fkgozali
Copy link
Copy Markdown
Contributor

@fkgozali fkgozali commented May 4, 2026

Summary:

Scheduler::uiManagerDidFinishTransaction and Scheduler::uiManagerDidDispatchCommand queue lambdas via runtimeScheduler_->scheduleRenderingUpdate that capture the raw delegate_ pointer by value. With RuntimeScheduler_Modern (the bridgeless implementation), the lambda runs asynchronously, so if the SchedulerDelegate is destroyed between enqueue and execution (surface teardown, Scheduler destruction, or setDelegate flip), the lambda dereferences dangling memory → EXC_BAD_ACCESS / KERN_INVALID_ADDRESS.

Add a per-delegate-identity invalidation token (shared_ptr<atomic<bool>>) owned by Scheduler and captured by-value into the deferred lambdas. The destructor and setDelegate flip the flag to true; lambdas check it (acquire ordering) before dereferencing the captured raw delegate.

The shared_ptr keeps the atomic alive for any outstanding lambdas. Public API is unchanged (private member only), so C++ API snapshots are unaffected.

Changelog:
[General][Fixed] - Prevent Scheduler use-after-free crash when surfaces tear down with pending rendering updates

Reviewed By: mdvacca

Differential Revision: D103727974

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 4, 2026
@facebook-github-tools facebook-github-tools Bot added p: Facebook Partner: Facebook Partner labels May 4, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 4, 2026

@fkgozali has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103727974.

fkgozali added 2 commits May 4, 2026 19:36
Summary:

Introduces a new common React Native feature flag, `enableSchedulerDelegateInvalidation`, that gates a defensive guard around `Scheduler::uiManagerDidDispatchCommand` and `uiManagerDidFinishTransaction`. Those call sites enqueue a lambda via `runtimeScheduler_->scheduleRenderingUpdate` that captures the raw `SchedulerDelegate` pointer by value. With `RuntimeScheduler_Modern` (bridgeless) the lambda runs asynchronously, so if the `SchedulerDelegate` is destroyed between enqueue and execution the lambda dereferences dangling memory and the process crashes (`EXC_BAD_ACCESS` / `KERN_INVALID_ADDRESS`). The follow-up commit adds the actual guard wired to this flag.

Defaults to `false` so behavior is unchanged on landing. Apps opt in by overriding the flag in their own `ReactNativeFeatureFlagsDefaults` subclass.

Includes the regenerated feature-flag boilerplate (Kotlin/Java/Cxx accessors, defaults, providers, JS spec, native module) emitted by `yarn featureflags --update`.

Changelog:
[Internal]

Reviewed By: shwanton

Differential Revision: D103757271
…56680)

Summary:

Scheduler::uiManagerDidFinishTransaction and Scheduler::uiManagerDidDispatchCommand queue lambdas via runtimeScheduler_->scheduleRenderingUpdate that capture the raw delegate_ pointer by value. With RuntimeScheduler_Modern (the bridgeless implementation), the lambda runs asynchronously, so if the SchedulerDelegate is destroyed between enqueue and execution (surface teardown, Scheduler destruction, or setDelegate flip), the lambda dereferences dangling memory → EXC_BAD_ACCESS / KERN_INVALID_ADDRESS.

Add a per-delegate-identity invalidation token (`shared_ptr<atomic<bool>>`) owned by Scheduler and captured by-value into the deferred lambdas. The destructor and setDelegate flip the flag to true; lambdas check it (acquire ordering) before dereferencing the captured raw delegate.

The shared_ptr keeps the atomic alive for any outstanding lambdas. Public API is unchanged (private member only), so C++ API snapshots are unaffected.

Changelog:
[General][Fixed] - Prevent Scheduler use-after-free crash when surfaces tear down with pending rendering updates

Reviewed By: mdvacca

Differential Revision: D103727974
@meta-codesync meta-codesync Bot changed the title Guard Scheduler deferred lambdas against delegate teardown Guard Scheduler deferred lambdas against delegate teardown (#56680) May 5, 2026
@fkgozali fkgozali force-pushed the export-D103727974 branch from be32607 to 1441d7b Compare May 5, 2026 02:37
@meta-codesync meta-codesync Bot closed this in aadbe96 May 5, 2026
@facebook-github-tools facebook-github-tools Bot added the Merged This PR has been merged. label May 5, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 5, 2026

This pull request has been merged in aadbe96.

@delekta
Copy link
Copy Markdown
Contributor

delekta commented May 5, 2026

Hi @fkgozali, Kamil from Software Mansion here. We’re seeing what looks like the same issue in one of our client's project. I really like this solution, but I was wondering about one edge case.

Have you considered this issue?

// Thread A: queued rendering update
if (guardEnabled && *invalidated) { // reads false
return;
}

// Thread B: Scheduler::setDelegate(...)
*delegateInvalidated_ = true;
delegate_ = newDelegate;

// old delegate is freed

// Thread A resumes
delegate->schedulerShouldRenderTransactions(...); // possible use-after-free?

I like your solution because of how small and non-invasive it is. I was also wondering whether you considered more ownership/synchronization-heavy alternatives, like moving the delegate to shared_ptr / weak_ptr, or using a mutex / in-flight tracking around delegate calls. Those seem like they would close the lifetime race more completely, but they also look much more invasive because they would change the delegate ownership model or add locking around this path.

@fkgozali
Copy link
Copy Markdown
Contributor Author

fkgozali commented May 5, 2026

@delekta Thanks for the info! Note that this fix is a "workaround" because it's hiding the actual problem: "why is the queue not flushing things during teardown?". I expect some more iteration on this, but for now, at least the app should not hard crash, though it might show some more unpredictable behavior after teardown.

On moving the delegate to shared_ptr, yes that's one of the idea we're considering, but didn't pursue it yet because of the API change. The current focus is to isolate the root cause and find a robust fix, before cleaning up the API ergonomics etc. Hope that helps.

Do you happen to have solid repro for this btw? If so would be helpful to see if this featureflag/fix helps in your scenario.

@delekta
Copy link
Copy Markdown
Contributor

delekta commented May 6, 2026

@fkgozali Thanks, that makes sense. Unfortunately we do not have a repro. That has been the biggest issue for us with this bug. Without a repro, the best signal we have is crash volume after enabling the fix. I can try your solution in our scenario and watch whether the crash rate changes.

The closest mitigation we tried before was flushing the runtimeScheduler queue during Scheduler destructor here: discord#169. That drastically reduced the number of crashes for us, but did not eliminate them completely.

@fkgozali
Copy link
Copy Markdown
Contributor Author

fkgozali commented May 6, 2026

The closest mitigation we tried before was flushing the runtimeScheduler queue during Scheduler destructor

I see, I think there are a few scenarios overall that we need to identify:

  • pending calls during teardown --> should be flushed
  • some early failures that cause the runtime to be in a "bad state", hence attempting to flushing may cause other undefined behavior
  • dependencies (e.g. delegate) that somehow got dealloc earlier, maybe due to (2), causing hard crash due to invoking a freed memory block

I suspect we need a combo of approaches, not just purely flushing, or dropping them like what I did in this PR

@fkgozali
Copy link
Copy Markdown
Contributor Author

@delekta FYI, we're adding a test to better simulate the crash scenario: #56800. With that, it should be easier to explore proper fixes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged This PR has been merged. meta-exported p: Facebook Partner: Facebook Partner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants