SDK-478 keychain migration off main thread, one-shot gated, race-safe reads by joaodordio · Pull Request #1066 · Iterable/iterable-swift-sdk

joaodordio · 2026-05-21T15:54:48Z

🔹 Jira Ticket(s)

SDK-478

✏️ Description

What this changes

Three improvements to the keychain migration introduced in 6.6.6 (PR #997):

One-shot gated. A new keychainMigrationCompleted flag in LocalStorageProtocol (backed by UserDefaults) skips the migration on every cold start after the first successful run. The flag is read on the calling thread, but it is a single UserDefaults lookup, no keychain syscalls, so the hot path is effectively free.
First-launch cost stays on the calling thread, but only once. On the rare first launch after upgrading from a pre-isolated-keychain build the migration runs synchronously, because retrieveIdentifierData() runs immediately after updateSDKVersion() and must observe the migrated identity. Subsequent cold starts (the common case for every customer) pay zero migration cost.
Race-safe reads. IterableKeychain now serializes the migration (as a barrier write) against property reads/writes through a coordinatorQueue (DispatchQueue.concurrent). Reads run fully in parallel after migration completes; reads kicked off while migration is in flight block briefly behind the barrier and observe a consistent state. An internal migrationAttempted guard prevents the migration body from running twice within a single process lifetime if start() is invoked more than once before the UserDefaults flag is read back.

This PR also picks up a defensive crash guard from the unmerged commit b521d533 (guard !key.isEmpty on data/set/remove plus tighter SecItemCopyMatching status handling), adds non-success logging to KeychainWrapper.set / update so silent SecItemAdd/SecItemUpdate failures (e.g. entitlement issues) become observable in dev logs, and adds duration logging to IterableKeychain.migrateFromLegacy so future regressions of this shape surface immediately.

Why

The migration was introduced in commit 136ff643 (PR #997, Jan 2026) to isolate the keychain service name per bundle identifier and prevent cross-app keychain collisions. Functionally that change is correct and should stay. The runtime properties around how it is invoked surfaced during SDK-478 investigation:

It is called from InternalIterableAPI.updateSDKVersion(), which runs synchronously inside start() on the calling thread.
Cross-platform bridges (React Native and Flutter) invoke start() from DispatchQueue.main.async, so for those callers all of the migration's SecItem syscalls landed on the main thread.
The migration ran on every cold start regardless of whether there was anything to migrate, so even native iOS customers with no legacy data paid the cost every single launch.

The fix is to pay the cost exactly once per install rather than to push it off the main thread asynchronously: an async migration would race with retrieveIdentifierData() reading email / userId on the line immediately after, and would surface as an apparent sign-out on the first launch after upgrade.

The defensive crash guard from b521d533 has been sitting unmerged since April. It is adjacent to the same surface and worth picking up here.

Compatibility

The keychainMigrationCompleted UserDefaults flag defaults to false. Customers upgrading from any earlier version will trigger the migration once on first launch post-upgrade, then never again. No data path changes; the existing "no overwrite if isolated already has data" logic is preserved. Mock implementations of LocalStorageProtocol in tests get the new property as Bool = false so existing tests continue to invoke the migration body.

Tests

New SDK-478 tests in tests/unit-tests/KeychainMigrationTests.swift:

testMigrationRunsOffMainThread asserts the migration body executes on the queue that invoked it, including a background queue.
testReadDuringMigrationReturnsConsistentData kicks off the migration concurrently with 10 reads and asserts every read observes either nil or the migrated value, never an intermediate state.
testPublicSettersWriteThroughCoordinatorQueue and testPublicSetterNilRemovesValue cover the public setter path through the barrier.

New defensive-guard tests in tests/unit-tests/KeychainWrapperTests.swift: testEmptyKeyReturnsNilImmediately, testEmptyKeySetReturnsFalse, testEmptyKeyRemoveReturnsFalse.

Linked tickets

SDK-392
SDK-478
SDK-294 (original keychain isolation work that introduced the migration)

codecov · 2026-05-21T16:04:05Z

Codecov Report

❌ Patch coverage is 55.05618% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.27%. Comparing base (40f2de1) to head (c35b927).

Files with missing lines	Patch %	Lines
...Internal/Utilities/Keychain/IterableKeychain.swift	38.18%	34 Missing ⚠️
.../Internal/Utilities/Keychain/KeychainWrapper.swift	62.50%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1066      +/-   ##
==========================================
+ Coverage   71.22%   71.27%   +0.05%     
==========================================
  Files         112      112              
  Lines        9365     9389      +24     
==========================================
+ Hits         6670     6692      +22     
- Misses       2695     2697       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

franco-zalamena-iterable · 2026-05-26T09:21:12Z

+        // observe the migrated identity. See SDK-478.
+        if !localStorage.keychainMigrationCompleted {
+            localStorage.migrateKeychainToIsolatedStorage()
+            localStorage.keychainMigrationCompleted = true


can we check if the boolean of the migrateKeychainToIsolatedStorage is true before flipping the keychainMigrationCompleted flag to true?

sumeruchat

Overall

CHANGES_REQUESTED — fix-before-ship. Reviewed end-to-end with a Codex pass. The race-safety claim (coordinatorQueue barrier + concurrent reads) and the LocalStorage 3-layer wiring of the new flag both check out cleanly. But the one-shot completion gate isn't safe yet:

Franco's concern verified — at InternalIterableAPI.swift:1113 the flag flips unconditionally regardless of whether migrateKeychainToIsolatedStorage() succeeded. The protocol method also returns Void today, so the API can't even express success/failure yet.
Data-loss risk in migrateFromLegacy() itself — rawWrite is Void and doesn't surface failures, but the next line in each per-key block (legacyWrapper.removeValue(...)) deletes the legacy entry unconditionally. If the isolated wrapper.set(...) fails silently (entitlement, disk full, cert), legacy is gone before isolated is confirmed → identity lost. Inline comments below.
Open question for you: is downgrade/re-upgrade with a pre-isolated SDK something we need to support? If yes, the Bool gate alone isn't enough — needs to also check sdkVersion, otherwise the flag persists across the downgrade and re-upgrade silently skips migration of any legacy identity the older SDK wrote. If no, ack and we can move on.

Tests worth adding (none currently cover these):

First cold start flips flag only on success.
Second cold start skips migration.
Migration failure does NOT flip the flag.
Migration with a failing wrapper write → legacy data NOT deleted (the data-loss case).

Bridge note (no action required): first-launch migration still runs synchronously on the calling thread, so RN/Flutter start() invoked from DispatchQueue.main.async will still see a one-time main-thread block on the first launch after upgrade. The correctness tradeoff (retrieveIdentifierData needs to observe migrated identity) is defensible — just flagging that the PR description shouldn't claim it eliminates the main-thread migration entirely.

Inline comments at the exact lines below.

sumeruchat · 2026-05-26T10:36:55Z

+        // observe the migrated identity. See SDK-478.
+        if !localStorage.keychainMigrationCompleted {
+            localStorage.migrateKeychainToIsolatedStorage()
+            localStorage.keychainMigrationCompleted = true


Confirming Franco's concern is real — the flag flips unconditionally. And LocalStorageProtocol.migrateKeychainToIsolatedStorage() currently returns Void, so this fix needs to start by changing the protocol method to return Bool (and the implementation chain in LocalStorage + IterableKeychain.migrateFromLegacy).

Proposed:

if !localStorage.keychainMigrationCompleted { if localStorage.migrateKeychainToIsolatedStorage() { localStorage.keychainMigrationCompleted = true } else { ITBError("keychain migration did not complete; will retry on next cold start") } }

⚠️ Important semantics: success should mean "the attempt completed safely", NOT "moved at least one item". A no-op / no-legacy-data run still counts as successful — otherwise the hot path retries forever on fresh installs that never had legacy data.

sumeruchat · 2026-05-26T10:36:55Z

+    /// completion at least once on this install. Gates the migration so it
+    /// is a true one-shot - subsequent `start()` calls do not pay the
+    /// SecItem syscall cost. See SDK-478.
+    var keychainMigrationCompleted: Bool { get set }


The new flag is wired correctly here. But per the data-loss issue on IterableKeychain (see comment below), func migrateKeychainToIsolatedStorage() just below (line 54) and its extension default (~line 61) need to return Bool (or a small result enum) so callers can gate the completion flag:

func migrateKeychainToIsolatedStorage() -> Bool

And the extension default → return true (a no-op default IS a successful no-op).

sumeruchat · 2026-05-26T10:36:55Z

    }
-
+
+    var keychainMigrationCompleted: Bool {


The new keychainMigrationCompleted wiring through iterableUserDefaults is clean.

Separately, propagate the Bool through this layer's existing migrateKeychainToIsolatedStorage() at line 182 (outside this hunk, but a coupled change):

func migrateKeychainToIsolatedStorage() -> Bool { keychain.migrateFromLegacy() }

(after IterableKeychain.migrateFromLegacy() is changed to return the aggregate Bool — see comment on IterableKeychain.swift.)

sumeruchat · 2026-05-26T10:36:55Z

+               let legacyEmail = rawString(forKey: Const.Keychain.Key.email, from: legacyWrapper) {
+                rawWrite(string: legacyEmail, forKey: Const.Keychain.Key.email, to: wrapper)
+                legacyWrapper.removeValue(forKey: Const.Keychain.Key.email)
+                ITBInfo("UPDATED: migrated email from legacy keychain to isolated keychain")


🔴 Data-loss risk. rawWrite is Void and doesn't surface failures, but this next line (legacyWrapper.removeValue(...)) deletes the legacy entry unconditionally. If wrapper.set(data, forKey: key) inside rawWrite fails silently (entitlement / disk full / cert issue), legacy is gone before the isolated write is confirmed → identity lost.

Guard each per-key delete on a successful isolated write:

if rawWrite(string: legacyEmail, forKey: Const.Keychain.Key.email, to: wrapper) { legacyWrapper.removeValue(forKey: Const.Keychain.Key.email) ITBInfo("UPDATED: migrated email from legacy keychain to isolated keychain") migrated = true } else { // isolated write failed — keep legacy intact so the next cold start retries allSucceeded = false }

Same pattern for userId (~line 48), userIdUnknownUser (~line 56), and authToken (~line 64). Then migrateFromLegacy() should return the aggregate Bool — true only if every attempted write either succeeded or had no legacy data to migrate.

sumeruchat · 2026-05-26T10:36:55Z

+
+    /// Raw write - bypasses the coordinator queue. Only safe to call from
+    /// inside a `coordinatorQueue.sync(flags: .barrier) { }` block.
+    private func rawWrite(string: String?, forKey key: String, to wrapper: KeychainWrapper) {


Make rawWrite return Bool so the migration loop can detect failure and skip the legacy delete:

@discardableResult private func rawWrite(string: String?, forKey key: String, to wrapper: KeychainWrapper) -> Bool { guard let value = string, let data = value.data(using: .utf8) else { return wrapper.removeValue(forKey: key) } return wrapper.set(data, forKey: key) }

This pairs with the failure-logging you already added to KeychainWrapper.set/update — small additional step to surface that failure to the caller, not just to logs.

joaodordio added 2 commits May 21, 2026 16:52

✨ move keychain migration off main thread

5c4d021

🐛 KeychainWrapper crash fix

6215042

joaodordio self-assigned this May 21, 2026

joaodordio added the BCIT Run Business Critical Integration Tests label May 21, 2026

🧪 Added more tests

29fa5ed

joaodordio marked this pull request as ready for review May 21, 2026 17:11

🔧 Added migrationAttempted gate

c35b927

franco-zalamena-iterable self-requested a review May 25, 2026 15:59

franco-zalamena-iterable requested changes May 26, 2026

View reviewed changes

sumeruchat requested changes May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK-478 keychain migration off main thread, one-shot gated, race-safe reads#1066

SDK-478 keychain migration off main thread, one-shot gated, race-safe reads#1066
joaodordio wants to merge 4 commits into
masterfrom
fix/SDK-478-keychain-off-main-thread

joaodordio commented May 21, 2026 •

edited by atlassian Bot

Loading

Uh oh!

codecov Bot commented May 21, 2026 •

edited

Loading

Uh oh!

franco-zalamena-iterable May 26, 2026

Uh oh!

sumeruchat left a comment

Uh oh!

sumeruchat May 26, 2026

Uh oh!

sumeruchat May 26, 2026

Uh oh!

sumeruchat May 26, 2026

Uh oh!

sumeruchat May 26, 2026

Uh oh!

sumeruchat May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

joaodordio commented May 21, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔹 Jira Ticket(s)

✏️ Description

What this changes

Why

Compatibility

Tests

Linked tickets

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

franco-zalamena-iterable May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sumeruchat left a comment

Choose a reason for hiding this comment

Overall

Uh oh!

sumeruchat May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sumeruchat May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sumeruchat May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sumeruchat May 26, 2026

Choose a reason for hiding this comment

Uh oh!

sumeruchat May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joaodordio commented May 21, 2026 •

edited by atlassian Bot

Loading

codecov Bot commented May 21, 2026 •

edited

Loading