Skip to content

perf(python-dotenv): replace dict merge with ChainMap in resolve_variables#628

Open
matheusvir wants to merge 11 commits intotheskumar:mainfrom
matheusvir:optimization/chainmap-resolve-variables
Open

perf(python-dotenv): replace dict merge with ChainMap in resolve_variables#628
matheusvir wants to merge 11 commits intotheskumar:mainfrom
matheusvir:optimization/chainmap-resolve-variables

Conversation

@matheusvir
Copy link

What was done

Changes were made to resolve_variables() in src/dotenv/main.py.
Previously, python-dotenv built the resolved environment by repeatedly merging dictionaries, creating full copies of all key-value pairs at each interpolation pass.
This is replaced with a ChainMap, which composes the same logical view without copying: lookups traverse the chain at read time, eliminating the merge step entirely.

No new test files were added. The change is covered by the existing suite — tests/test_main.py exercises resolve_variables() through dotenv_values() with both interpolate=True and interpolate=False, and tests/test_variables.py covers parse_variables() directly. All existing tests pass with no regressions.


Performance

All benchmarks were executed inside Docker containers to isolate the runtime environment and eliminate host-specific variance from CPU scheduling, OS caching, and library versions.

Methodology

  • 50 total runs per scale point; first 10 (warmup) and last 10 (cooldown) discarded; 30 effective runs measured.
  • Two independent scenarios measured:
    • Without interpolation: dotenv_values() called with interpolate=False.
    • With interpolation: dotenv_values() called with interpolate=True (default behavior).
  • Variable counts follow a logarithmic progression from 10 to 25,000.
  • Timing: time.perf_counter_ns() with GC disabled during measurement.

Rationale

The previous implementation of resolve_variables() builds the resolved environment by repeatedly merging dictionaries, creating full copies of all key-value pairs at each interpolation pass. For large .env files this results in O(N) memory allocation and copying work that grows with the number of variables.

ChainMap composes the same logical view without copying: lookups traverse the chain at read time, so the merge step is eliminated entirely.

Results — without interpolation

Variables Baseline mean (ms) Optimized mean (ms) Improvement
10 0.97 ± 0.31 0.87 ± 0.15 10.27%
982 80.71 ± 5.06 25.37 ± 1.56 68.57%
8,497 1,906.04 ± 312.44 579.83 ± 86.53 69.58%
25,000 23,514.52 ± 2,911.14 5,651.43 ± 668.88 75.97%

ChainMap benchmark — no interpolation

Results — with interpolation

Variables Baseline mean (ms) Optimized mean (ms) Improvement
10 0.76 ± 0.08 0.57 ± 0.20 25.16%
982 76.97 ± 8.77 45.23 ± 4.76 41.23%
8,497 1,806.18 ± 240.60 759.82 ± 52.20 57.93%
25,000 23,366.69 ± 3,725.48 6,510.14 ± 822.94 72.14%

ChainMap benchmark — with interpolation

Analysis

The benefit is negligible at small sizes (below ~50 variables) but grows consistently from there. At 25,000 variables, the optimization reduces load time by ~76% without interpolation and ~72% with interpolation. The standard deviation also decreases in optimized runs, indicating more predictable latency.

The fact that both interpolation modes show the same trend confirms that the gain comes from eliminating the merge overhead in resolve_variables(), not from any interpolation-specific code path.

Reproducing the benchmark

The full benchmark infrastructure is available in the research repository at matheusvir/eda-oss-performance.

Relevant files:

To run inside Docker:

# From the root of eda-oss-performance
docker build -t dotenv-perf ./setup/python-dotenv/
docker run --rm -e EXPERIMENT=chainmap_load_test dotenv-perf

Results are written to results/python-dotenv/result_no_interpolation_dotenv_chainmap.json and results/python-dotenv/result_interpolation_dotenv_chainmap.json.


Feedback on the ChainMap integration, edge cases with nested interpolation, and compatibility with existing override semantics is welcome.


Relates to #504.

RailtonDantas and others added 2 commits March 11, 2026 22:28
Co-authored-by: Matheus Virgolino <matheus.virgolino.abilio.da.silva@ccc.ufcg.edu.br>
Co-authored-by: Manoel Netto <manoel.da.nobrega.eustaqueo.netto@ccc.ufcg.edu.br>
Co-authored-by: Pedro <pedroalmeida1896@gmail.com>
Co-authored-by: Lucaslg7 <lucasmoizinholg7@gmail.com>
Co-authored-by: RailtonDantas <railtondantas.code@gmail.com>
Co-authored-by: João Pereira <joao.pereira.de.oliveira@ccc.ufcg.edu.br>
Co-authored-by: Matheus Virgolino <matheus.virgolino.abilio.da.silva@ccc.ufcg.edu.br>
Co-authored-by: Manoel Netto <manoel.da.nobrega.eustaqueo.netto@ccc.ufcg.edu.br>
Co-authored-by: Pedro <pedroalmeida1896@gmail.com>
Co-authored-by: Lucaslg7 <lucasmoizinholg7@gmail.com>
Co-authored-by: RailtonDantas <railtondantas.code@gmail.com>
Co-authored-by: João Pereira <joao.pereira.de.oliveira@ccc.ufcg.edu.br>
@matheusvir matheusvir force-pushed the optimization/chainmap-resolve-variables branch from 8a3d287 to 9343989 Compare March 12, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants