SDK-478 keychain migration off main thread, one-shot gated, race-safe reads#1066
SDK-478 keychain migration off main thread, one-shot gated, race-safe reads#1066joaodordio wants to merge 4 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1066 +/- ##
==========================================
+ Coverage 71.22% 71.27% +0.05%
==========================================
Files 112 112
Lines 9365 9389 +24
==========================================
+ Hits 6670 6692 +22
- Misses 2695 2697 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // observe the migrated identity. See SDK-478. | ||
| if !localStorage.keychainMigrationCompleted { | ||
| localStorage.migrateKeychainToIsolatedStorage() | ||
| localStorage.keychainMigrationCompleted = true |
There was a problem hiding this comment.
can we check if the boolean of the migrateKeychainToIsolatedStorage is true before flipping the keychainMigrationCompleted flag to true?
sumeruchat
left a comment
There was a problem hiding this comment.
Overall
CHANGES_REQUESTED — fix-before-ship. Reviewed end-to-end with a Codex pass. The race-safety claim (coordinatorQueue barrier + concurrent reads) and the LocalStorage 3-layer wiring of the new flag both check out cleanly. But the one-shot completion gate isn't safe yet:
- Franco's concern verified — at
InternalIterableAPI.swift:1113the flag flips unconditionally regardless of whethermigrateKeychainToIsolatedStorage()succeeded. The protocol method also returnsVoidtoday, so the API can't even express success/failure yet. - Data-loss risk in
migrateFromLegacy()itself —rawWriteisVoidand doesn't surface failures, but the next line in each per-key block (legacyWrapper.removeValue(...)) deletes the legacy entry unconditionally. If the isolatedwrapper.set(...)fails silently (entitlement, disk full, cert), legacy is gone before isolated is confirmed → identity lost. Inline comments below. - Open question for you: is downgrade/re-upgrade with a pre-isolated SDK something we need to support? If yes, the
Boolgate alone isn't enough — needs to also checksdkVersion, otherwise the flag persists across the downgrade and re-upgrade silently skips migration of any legacy identity the older SDK wrote. If no, ack and we can move on.
Tests worth adding (none currently cover these):
- First cold start flips flag only on success.
- Second cold start skips migration.
- Migration failure does NOT flip the flag.
- Migration with a failing wrapper write → legacy data NOT deleted (the data-loss case).
Bridge note (no action required): first-launch migration still runs synchronously on the calling thread, so RN/Flutter start() invoked from DispatchQueue.main.async will still see a one-time main-thread block on the first launch after upgrade. The correctness tradeoff (retrieveIdentifierData needs to observe migrated identity) is defensible — just flagging that the PR description shouldn't claim it eliminates the main-thread migration entirely.
Inline comments at the exact lines below.
| // observe the migrated identity. See SDK-478. | ||
| if !localStorage.keychainMigrationCompleted { | ||
| localStorage.migrateKeychainToIsolatedStorage() | ||
| localStorage.keychainMigrationCompleted = true |
There was a problem hiding this comment.
Confirming Franco's concern is real — the flag flips unconditionally. And LocalStorageProtocol.migrateKeychainToIsolatedStorage() currently returns Void, so this fix needs to start by changing the protocol method to return Bool (and the implementation chain in LocalStorage + IterableKeychain.migrateFromLegacy).
Proposed:
if !localStorage.keychainMigrationCompleted {
if localStorage.migrateKeychainToIsolatedStorage() {
localStorage.keychainMigrationCompleted = true
} else {
ITBError("keychain migration did not complete; will retry on next cold start")
}
}| /// completion at least once on this install. Gates the migration so it | ||
| /// is a true one-shot - subsequent `start()` calls do not pay the | ||
| /// SecItem syscall cost. See SDK-478. | ||
| var keychainMigrationCompleted: Bool { get set } |
There was a problem hiding this comment.
The new flag is wired correctly here. But per the data-loss issue on IterableKeychain (see comment below), func migrateKeychainToIsolatedStorage() just below (line 54) and its extension default (~line 61) need to return Bool (or a small result enum) so callers can gate the completion flag:
func migrateKeychainToIsolatedStorage() -> BoolAnd the extension default → return true (a no-op default IS a successful no-op).
| } | ||
|
|
||
|
|
||
| var keychainMigrationCompleted: Bool { |
There was a problem hiding this comment.
The new keychainMigrationCompleted wiring through iterableUserDefaults is clean.
Separately, propagate the Bool through this layer's existing migrateKeychainToIsolatedStorage() at line 182 (outside this hunk, but a coupled change):
func migrateKeychainToIsolatedStorage() -> Bool {
keychain.migrateFromLegacy()
}(after IterableKeychain.migrateFromLegacy() is changed to return the aggregate Bool — see comment on IterableKeychain.swift.)
| let legacyEmail = rawString(forKey: Const.Keychain.Key.email, from: legacyWrapper) { | ||
| rawWrite(string: legacyEmail, forKey: Const.Keychain.Key.email, to: wrapper) | ||
| legacyWrapper.removeValue(forKey: Const.Keychain.Key.email) | ||
| ITBInfo("UPDATED: migrated email from legacy keychain to isolated keychain") |
There was a problem hiding this comment.
🔴 Data-loss risk. rawWrite is Void and doesn't surface failures, but this next line (legacyWrapper.removeValue(...)) deletes the legacy entry unconditionally. If wrapper.set(data, forKey: key) inside rawWrite fails silently (entitlement / disk full / cert issue), legacy is gone before the isolated write is confirmed → identity lost.
Guard each per-key delete on a successful isolated write:
if rawWrite(string: legacyEmail, forKey: Const.Keychain.Key.email, to: wrapper) {
legacyWrapper.removeValue(forKey: Const.Keychain.Key.email)
ITBInfo("UPDATED: migrated email from legacy keychain to isolated keychain")
migrated = true
} else {
// isolated write failed — keep legacy intact so the next cold start retries
allSucceeded = false
}Same pattern for userId (~line 48), userIdUnknownUser (~line 56), and authToken (~line 64). Then migrateFromLegacy() should return the aggregate Bool — true only if every attempted write either succeeded or had no legacy data to migrate.
|
|
||
| /// Raw write - bypasses the coordinator queue. Only safe to call from | ||
| /// inside a `coordinatorQueue.sync(flags: .barrier) { }` block. | ||
| private func rawWrite(string: String?, forKey key: String, to wrapper: KeychainWrapper) { |
There was a problem hiding this comment.
Make rawWrite return Bool so the migration loop can detect failure and skip the legacy delete:
@discardableResult
private func rawWrite(string: String?, forKey key: String, to wrapper: KeychainWrapper) -> Bool {
guard let value = string, let data = value.data(using: .utf8) else {
return wrapper.removeValue(forKey: key)
}
return wrapper.set(data, forKey: key)
}This pairs with the failure-logging you already added to KeychainWrapper.set/update — small additional step to surface that failure to the caller, not just to logs.
🔹 Jira Ticket(s)
✏️ Description
What this changes
Three improvements to the keychain migration introduced in 6.6.6 (PR #997):
keychainMigrationCompletedflag inLocalStorageProtocol(backed by UserDefaults) skips the migration on every cold start after the first successful run. The flag is read on the calling thread, but it is a singleUserDefaultslookup, no keychain syscalls, so the hot path is effectively free.retrieveIdentifierData()runs immediately afterupdateSDKVersion()and must observe the migrated identity. Subsequent cold starts (the common case for every customer) pay zero migration cost.IterableKeychainnow serializes the migration (as a barrier write) against property reads/writes through acoordinatorQueue(DispatchQueue.concurrent). Reads run fully in parallel after migration completes; reads kicked off while migration is in flight block briefly behind the barrier and observe a consistent state. An internalmigrationAttemptedguard prevents the migration body from running twice within a single process lifetime ifstart()is invoked more than once before the UserDefaults flag is read back.This PR also picks up a defensive crash guard from the unmerged commit
b521d533(guard !key.isEmptyon data/set/remove plus tighterSecItemCopyMatchingstatus handling), adds non-success logging toKeychainWrapper.set/updateso silentSecItemAdd/SecItemUpdatefailures (e.g. entitlement issues) become observable in dev logs, and adds duration logging toIterableKeychain.migrateFromLegacyso future regressions of this shape surface immediately.Why
The migration was introduced in commit
136ff643(PR #997, Jan 2026) to isolate the keychain service name per bundle identifier and prevent cross-app keychain collisions. Functionally that change is correct and should stay. The runtime properties around how it is invoked surfaced during SDK-478 investigation:InternalIterableAPI.updateSDKVersion(), which runs synchronously insidestart()on the calling thread.start()fromDispatchQueue.main.async, so for those callers all of the migration's SecItem syscalls landed on the main thread.The fix is to pay the cost exactly once per install rather than to push it off the main thread asynchronously: an async migration would race with
retrieveIdentifierData()readingemail/userIdon the line immediately after, and would surface as an apparent sign-out on the first launch after upgrade.The defensive crash guard from
b521d533has been sitting unmerged since April. It is adjacent to the same surface and worth picking up here.Compatibility
The
keychainMigrationCompletedUserDefaults flag defaults to false. Customers upgrading from any earlier version will trigger the migration once on first launch post-upgrade, then never again. No data path changes; the existing "no overwrite if isolated already has data" logic is preserved. Mock implementations ofLocalStorageProtocolin tests get the new property asBool = falseso existing tests continue to invoke the migration body.Tests
New SDK-478 tests in
tests/unit-tests/KeychainMigrationTests.swift:testMigrationRunsOffMainThreadasserts the migration body executes on the queue that invoked it, including a background queue.testReadDuringMigrationReturnsConsistentDatakicks off the migration concurrently with 10 reads and asserts every read observes either nil or the migrated value, never an intermediate state.testPublicSettersWriteThroughCoordinatorQueueandtestPublicSetterNilRemovesValuecover the public setter path through the barrier.New defensive-guard tests in
tests/unit-tests/KeychainWrapperTests.swift:testEmptyKeyReturnsNilImmediately,testEmptyKeySetReturnsFalse,testEmptyKeyRemoveReturnsFalse.Linked tickets