Skip to content

SDK-478 keychain migration off main thread, one-shot gated, race-safe reads#1066

Open
joaodordio wants to merge 4 commits into
masterfrom
fix/SDK-478-keychain-off-main-thread
Open

SDK-478 keychain migration off main thread, one-shot gated, race-safe reads#1066
joaodordio wants to merge 4 commits into
masterfrom
fix/SDK-478-keychain-off-main-thread

Conversation

@joaodordio
Copy link
Copy Markdown
Member

@joaodordio joaodordio commented May 21, 2026

🔹 Jira Ticket(s)

✏️ Description

What this changes

Three improvements to the keychain migration introduced in 6.6.6 (PR #997):

  1. One-shot gated. A new keychainMigrationCompleted flag in LocalStorageProtocol (backed by UserDefaults) skips the migration on every cold start after the first successful run. The flag is read on the calling thread, but it is a single UserDefaults lookup, no keychain syscalls, so the hot path is effectively free.
  2. First-launch cost stays on the calling thread, but only once. On the rare first launch after upgrading from a pre-isolated-keychain build the migration runs synchronously, because retrieveIdentifierData() runs immediately after updateSDKVersion() and must observe the migrated identity. Subsequent cold starts (the common case for every customer) pay zero migration cost.
  3. Race-safe reads. IterableKeychain now serializes the migration (as a barrier write) against property reads/writes through a coordinatorQueue (DispatchQueue.concurrent). Reads run fully in parallel after migration completes; reads kicked off while migration is in flight block briefly behind the barrier and observe a consistent state. An internal migrationAttempted guard prevents the migration body from running twice within a single process lifetime if start() is invoked more than once before the UserDefaults flag is read back.

This PR also picks up a defensive crash guard from the unmerged commit b521d533 (guard !key.isEmpty on data/set/remove plus tighter SecItemCopyMatching status handling), adds non-success logging to KeychainWrapper.set / update so silent SecItemAdd/SecItemUpdate failures (e.g. entitlement issues) become observable in dev logs, and adds duration logging to IterableKeychain.migrateFromLegacy so future regressions of this shape surface immediately.

Why

The migration was introduced in commit 136ff643 (PR #997, Jan 2026) to isolate the keychain service name per bundle identifier and prevent cross-app keychain collisions. Functionally that change is correct and should stay. The runtime properties around how it is invoked surfaced during SDK-478 investigation:

  • It is called from InternalIterableAPI.updateSDKVersion(), which runs synchronously inside start() on the calling thread.
  • Cross-platform bridges (React Native and Flutter) invoke start() from DispatchQueue.main.async, so for those callers all of the migration's SecItem syscalls landed on the main thread.
  • The migration ran on every cold start regardless of whether there was anything to migrate, so even native iOS customers with no legacy data paid the cost every single launch.

The fix is to pay the cost exactly once per install rather than to push it off the main thread asynchronously: an async migration would race with retrieveIdentifierData() reading email / userId on the line immediately after, and would surface as an apparent sign-out on the first launch after upgrade.

The defensive crash guard from b521d533 has been sitting unmerged since April. It is adjacent to the same surface and worth picking up here.

Compatibility

The keychainMigrationCompleted UserDefaults flag defaults to false. Customers upgrading from any earlier version will trigger the migration once on first launch post-upgrade, then never again. No data path changes; the existing "no overwrite if isolated already has data" logic is preserved. Mock implementations of LocalStorageProtocol in tests get the new property as Bool = false so existing tests continue to invoke the migration body.

Tests

New SDK-478 tests in tests/unit-tests/KeychainMigrationTests.swift:

  • testMigrationRunsOffMainThread asserts the migration body executes on the queue that invoked it, including a background queue.
  • testReadDuringMigrationReturnsConsistentData kicks off the migration concurrently with 10 reads and asserts every read observes either nil or the migrated value, never an intermediate state.
  • testPublicSettersWriteThroughCoordinatorQueue and testPublicSetterNilRemovesValue cover the public setter path through the barrier.

New defensive-guard tests in tests/unit-tests/KeychainWrapperTests.swift: testEmptyKeyReturnsNilImmediately, testEmptyKeySetReturnsFalse, testEmptyKeyRemoveReturnsFalse.

Linked tickets

  • SDK-392
  • SDK-478
  • SDK-294 (original keychain isolation work that introduced the migration)

@joaodordio joaodordio self-assigned this May 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 55.05618% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.27%. Comparing base (40f2de1) to head (c35b927).

Files with missing lines Patch % Lines
...Internal/Utilities/Keychain/IterableKeychain.swift 38.18% 34 Missing ⚠️
.../Internal/Utilities/Keychain/KeychainWrapper.swift 62.50% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1066      +/-   ##
==========================================
+ Coverage   71.22%   71.27%   +0.05%     
==========================================
  Files         112      112              
  Lines        9365     9389      +24     
==========================================
+ Hits         6670     6692      +22     
- Misses       2695     2697       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@joaodordio joaodordio added the BCIT Run Business Critical Integration Tests label May 21, 2026
@joaodordio joaodordio marked this pull request as ready for review May 21, 2026 17:11
// observe the migrated identity. See SDK-478.
if !localStorage.keychainMigrationCompleted {
localStorage.migrateKeychainToIsolatedStorage()
localStorage.keychainMigrationCompleted = true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check if the boolean of the migrateKeychainToIsolatedStorage is true before flipping the keychainMigrationCompleted flag to true?

Copy link
Copy Markdown
Contributor

@sumeruchat sumeruchat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

CHANGES_REQUESTED — fix-before-ship. Reviewed end-to-end with a Codex pass. The race-safety claim (coordinatorQueue barrier + concurrent reads) and the LocalStorage 3-layer wiring of the new flag both check out cleanly. But the one-shot completion gate isn't safe yet:

  1. Franco's concern verified — at InternalIterableAPI.swift:1113 the flag flips unconditionally regardless of whether migrateKeychainToIsolatedStorage() succeeded. The protocol method also returns Void today, so the API can't even express success/failure yet.
  2. Data-loss risk in migrateFromLegacy() itselfrawWrite is Void and doesn't surface failures, but the next line in each per-key block (legacyWrapper.removeValue(...)) deletes the legacy entry unconditionally. If the isolated wrapper.set(...) fails silently (entitlement, disk full, cert), legacy is gone before isolated is confirmed → identity lost. Inline comments below.
  3. Open question for you: is downgrade/re-upgrade with a pre-isolated SDK something we need to support? If yes, the Bool gate alone isn't enough — needs to also check sdkVersion, otherwise the flag persists across the downgrade and re-upgrade silently skips migration of any legacy identity the older SDK wrote. If no, ack and we can move on.

Tests worth adding (none currently cover these):

  • First cold start flips flag only on success.
  • Second cold start skips migration.
  • Migration failure does NOT flip the flag.
  • Migration with a failing wrapper write → legacy data NOT deleted (the data-loss case).

Bridge note (no action required): first-launch migration still runs synchronously on the calling thread, so RN/Flutter start() invoked from DispatchQueue.main.async will still see a one-time main-thread block on the first launch after upgrade. The correctness tradeoff (retrieveIdentifierData needs to observe migrated identity) is defensible — just flagging that the PR description shouldn't claim it eliminates the main-thread migration entirely.

Inline comments at the exact lines below.

// observe the migrated identity. See SDK-478.
if !localStorage.keychainMigrationCompleted {
localStorage.migrateKeychainToIsolatedStorage()
localStorage.keychainMigrationCompleted = true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming Franco's concern is real — the flag flips unconditionally. And LocalStorageProtocol.migrateKeychainToIsolatedStorage() currently returns Void, so this fix needs to start by changing the protocol method to return Bool (and the implementation chain in LocalStorage + IterableKeychain.migrateFromLegacy).

Proposed:

if !localStorage.keychainMigrationCompleted {
    if localStorage.migrateKeychainToIsolatedStorage() {
        localStorage.keychainMigrationCompleted = true
    } else {
        ITBError("keychain migration did not complete; will retry on next cold start")
    }
}

⚠️ Important semantics: success should mean "the attempt completed safely", NOT "moved at least one item". A no-op / no-legacy-data run still counts as successful — otherwise the hot path retries forever on fresh installs that never had legacy data.

/// completion at least once on this install. Gates the migration so it
/// is a true one-shot - subsequent `start()` calls do not pay the
/// SecItem syscall cost. See SDK-478.
var keychainMigrationCompleted: Bool { get set }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new flag is wired correctly here. But per the data-loss issue on IterableKeychain (see comment below), func migrateKeychainToIsolatedStorage() just below (line 54) and its extension default (~line 61) need to return Bool (or a small result enum) so callers can gate the completion flag:

func migrateKeychainToIsolatedStorage() -> Bool

And the extension default → return true (a no-op default IS a successful no-op).

}


var keychainMigrationCompleted: Bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new keychainMigrationCompleted wiring through iterableUserDefaults is clean.

Separately, propagate the Bool through this layer's existing migrateKeychainToIsolatedStorage() at line 182 (outside this hunk, but a coupled change):

func migrateKeychainToIsolatedStorage() -> Bool {
    keychain.migrateFromLegacy()
}

(after IterableKeychain.migrateFromLegacy() is changed to return the aggregate Bool — see comment on IterableKeychain.swift.)

let legacyEmail = rawString(forKey: Const.Keychain.Key.email, from: legacyWrapper) {
rawWrite(string: legacyEmail, forKey: Const.Keychain.Key.email, to: wrapper)
legacyWrapper.removeValue(forKey: Const.Keychain.Key.email)
ITBInfo("UPDATED: migrated email from legacy keychain to isolated keychain")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Data-loss risk. rawWrite is Void and doesn't surface failures, but this next line (legacyWrapper.removeValue(...)) deletes the legacy entry unconditionally. If wrapper.set(data, forKey: key) inside rawWrite fails silently (entitlement / disk full / cert issue), legacy is gone before the isolated write is confirmed → identity lost.

Guard each per-key delete on a successful isolated write:

if rawWrite(string: legacyEmail, forKey: Const.Keychain.Key.email, to: wrapper) {
    legacyWrapper.removeValue(forKey: Const.Keychain.Key.email)
    ITBInfo("UPDATED: migrated email from legacy keychain to isolated keychain")
    migrated = true
} else {
    // isolated write failed — keep legacy intact so the next cold start retries
    allSucceeded = false
}

Same pattern for userId (~line 48), userIdUnknownUser (~line 56), and authToken (~line 64). Then migrateFromLegacy() should return the aggregate Booltrue only if every attempted write either succeeded or had no legacy data to migrate.


/// Raw write - bypasses the coordinator queue. Only safe to call from
/// inside a `coordinatorQueue.sync(flags: .barrier) { }` block.
private func rawWrite(string: String?, forKey key: String, to wrapper: KeychainWrapper) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make rawWrite return Bool so the migration loop can detect failure and skip the legacy delete:

@discardableResult
private func rawWrite(string: String?, forKey key: String, to wrapper: KeychainWrapper) -> Bool {
    guard let value = string, let data = value.data(using: .utf8) else {
        return wrapper.removeValue(forKey: key)
    }
    return wrapper.set(data, forKey: key)
}

This pairs with the failure-logging you already added to KeychainWrapper.set/update — small additional step to surface that failure to the caller, not just to logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BCIT Run Business Critical Integration Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants