Skip to content

fix: produce() snapshots mutable values when recording patches#34

Merged
berendkleinhaneveld merged 6 commits into
masterfrom
fix/produce-snapshot-values
May 26, 2026
Merged

fix: produce() snapshots mutable values when recording patches#34
berendkleinhaneveld merged 6 commits into
masterfrom
fix/produce-snapshot-values

Conversation

@berendkleinhaneveld
Copy link
Copy Markdown
Collaborator

@berendkleinhaneveld berendkleinhaneveld commented May 26, 2026

Summary

PatchRecorder.record_add / record_replace / record_remove stored the raw value reference. If the recipe later mutated that container through the proxy, the earlier patch's stored value mutated with it, so replaying the patches diverged from the recipe's own result.

The fix is one line per call site: copy.deepcopy(value) before storing it in the patch dict, for both the forward value and the reverse old_value.

Reproducer (failing on master)

uv run python -c "
from patchdiff import produce, apply
def recipe(draft):
    draft['a'] = [1, 2, 3]
    draft['a'].append(4)
result, patches, _ = produce({}, recipe)
print('result:  ', result)
print('replayed:', apply({}, patches))
assert result == apply({}, patches), 'patches do not reproduce result'
"

Output on master:

result:   {'a': [1, 2, 3, 4]}
replayed: {'a': [1, 2, 3, 4, 4]}
AssertionError: patches do not reproduce result

On this branch the assertion passes.

Regression tests added

In tests/test_produce_core.py::TestPatchSnapshotting:

  • test_dict_value_mutated_after_assignment_is_snapshotted — assigns a list to a dict key then .appends to it; verifies both forward and reverse patches replay.
  • test_list_element_mutated_after_append_is_snapshotted — appends a dict to a list then mutates the appended dict's field via draft[-1]['x'] = 2.
  • test_nested_dict_in_list_in_dict_mutation_is_snapshotted — dict containing a list of dicts; mutates the nested tags lists and dict fields after the parent list was assigned.

All three assert apply(base, patches) == result and apply(result, reverse) == base.

Snapshot strategy: B (unconditional deepcopy)

The plan suggested starting with A (deepcopy only containers, decided by duck-typed hasattr checks). I tried both and measured. copy.deepcopy already has a fast-path for atomic types (registered in _deepcopy_dispatch) that just returns the same object — cheaper than three hasattr MRO walks. Microbench on mixed primitive/container input: unconditional is ~0.92× the time of gated. The full produce benchmark suite agrees: unconditional matches or beats the gated version on every test (worst regression vs master dropped from +15.6% with gating to +8.9% without). So option B is both simpler and faster here.

(to_raw is a different tool — it unwraps an observ reactive proxy to the underlying raw container, but that container is still a shared mutable reference. The bug is about subsequent mutation, which needs an independent copy. produce() already calls to_raw on the base at entry, so reactive values shouldn't reach the recorder in normal use.)

Performance: produce-vs-diff-* groups

Baseline = master @ a9ac89e (saved as 0001). NOW = this branch.

Benchmark Baseline (µs) NOW (µs) Δ
produce_deep_nested_mutation[in_place] 10.03 10.08 +0.5%
produce_deep_nested_mutation[copy] 33.03 32.95 −0.3%
produce_dict_small_mutations[in_place] 12.01 12.45 +3.6%
produce_dict_small_mutations[copy] 39.12 40.00 +2.2%
produce_dict_many_mutations[in_place] 306.43 333.69 +8.9%
produce_dict_many_mutations[copy] 570.89 599.24 +5.0%
produce_list_small_mutations[in_place] 7.24 7.48 +3.3%
produce_list_small_mutations[copy] 27.01 26.99 −0.0%
produce_list_many_appends[in_place] 54.59 57.14 +4.7%
produce_list_many_appends[copy] 73.27 76.91 +5.0%
produce_nested_structure[in_place] 154.81 156.14 +0.9%
produce_nested_structure[copy] 386.66 386.23 −0.1%
produce_set_mutations[in_place] 71.73 75.61 +5.4%
produce_set_mutations[copy] 175.02 178.25 +1.8%
produce_sparse_mutations_large_object[in_place] 509.46 507.73 −0.3%
produce_sparse_mutations_large_object[copy] 2462.73 2450.46 −0.5%

Worst single regression is +8.9% on dict_many_mutations[in_place], well under the 30% threshold from the plan. Most benchmarks are within run-to-run noise.

Test plan

  • uv run pytest tests/test_produce_core.py -v — 50 passed (incl. 3 new)
  • uv run pytest -x -q — 271 passed
  • uv run ruff check — clean
  • uv run ruff format --check — clean
  • Reproducer from the plan now succeeds
  • produce-vs-diff-* benchmarks compared against master baseline (table above)

🤖 Generated with Claude Code

berendkleinhaneveld and others added 2 commits May 26, 2026 09:58
PatchRecorder stored raw value references, so a recipe that mutated a
value after assigning it through the proxy would silently mutate the
earlier patch's stored value. Replaying the patches on the base produced
a different result than the recipe itself.

Snapshot mutable containers (dict/list/set-like, by duck typing) at
record time for both the forward value and the reverse old_value.
Primitives skip the deepcopy since they're already immutable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
copy.deepcopy already has a fast-path for atomic types — three hasattr
calls per record were slower than just dispatching to the deepcopy
atomic handler. Removing the gating speeds up every produce benchmark
slightly and removes a helper that duplicated the duck-typing logic
already in _wrap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Korijn
Copy link
Copy Markdown
Collaborator

Korijn commented May 26, 2026

I just want to recommend creating a local reference to copy.deepcopy to avoid the repeated global lookups.

Otherwise - jeez, expensive code! But I guess it is necessary :/ I don't recall immer.js doing this though...

berendkleinhaneveld and others added 2 commits May 26, 2026 13:11
PatchRecorder calls deepcopy on every recorded op. Aliasing once at
module scope replaces the per-call `copy` global lookup + `.deepcopy`
attribute lookup with a single global lookup. Microbench on the atomic
fast-path: ~18% faster per call.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cleaner than `import copy` + `_deepcopy = copy.deepcopy`, same effect:
deepcopy is bound directly in the module namespace, so each call is a
single global lookup with no attribute access.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@berendkleinhaneveld
Copy link
Copy Markdown
Collaborator Author

Yeah, it ain't great, but I expect most changes made in the recipe to be relatively small changes.

@berendkleinhaneveld berendkleinhaneveld marked this pull request as ready for review May 26, 2026 11:16
@berendkleinhaneveld berendkleinhaneveld merged commit 19e4015 into master May 26, 2026
@berendkleinhaneveld berendkleinhaneveld deleted the fix/produce-snapshot-values branch May 26, 2026 11:50
berendkleinhaneveld added a commit that referenced this pull request May 26, 2026
This version includes the following:

* ci: Update versions of steps and let setup-uv manage the python version (#40)
* perf: trim common prefix/suffix in diff_lists (#36)
* perf: use str.replace in Pointer escape/unescape (#37)
* fix: produce() snapshots mutable values when recording patches (#34)
* perf: linear reverse-op construction in diff_dicts/diff_sets (#38)
* fix: iapply raises on invalid patch paths (#35)
* fix: to_json mutating caller's ops (#33)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants