Performance: significant speedup and memory usage improvement for PCA().run() by jberg5 · Pull Request #5352 · MDAnalysis/mdanalysis

jberg5 · 2026-03-29T13:07:12Z

Changes made in this Pull Request:

~~Build covariance matrix once, during _conclude, rather than incrementally on every frame.~~ Build covariance matrix in batches of 500 frames using vanilla numpy matmul rather than accumulating a dot product every frame. This uses a much better optimized openblas gemm routine, and is where the bulk of the performance improvement comes from.
Switch from np.linalg.eig to np.linalg.eigh to take advantage of covariance matrix being symmetric and real-valued by construction. This is where the bulk of the memory usage improvement comes from - eigh lapack routine allocates a smaller workspace.

Results from running a PCA analysis on adk equilibrium data on a gcloud c2d-standard-4 (note: these results are out of date! I switched from a single covariance matrix multiplication to a batched approach, see the table after this one):


  ┌──────────┬───────┬────────┬───────┬─────────┬───────────────┬──────────────┐
  │  Select  │ Atoms │ Before │ After │ Speedup │ Memory Before │ Memory After │
  ├──────────┼───────┼────────┼───────┼─────────┼───────────────┼──────────────┤
  │ CA       │ 214   │ 5.2s   │ 3.9s  │ 1.3x    │ 13 MB         │ 27 MB        │
  ├──────────┼───────┼────────┼───────┼─────────┼───────────────┼──────────────┤
  │ Backbone │ 855   │ 39.0s  │ 6.4s  │ 6.1x    │ 201 MB        │ 183 MB       │
  ├──────────┼───────┼────────┼───────┼─────────┼───────────────┼──────────────┤
  │ All-atom │ 3341  │ 880s   │ 84s   │ 10.5x   │ 3067 MB       │ 1854 MB      │
  └──────────┴───────┴────────┴───────┴─────────┴───────────────┴──────────────┘

Here's the gist I used, heads up though Claude wrote it: https://gist.github.com/jberg5/522fd1892f02453145e59e90e5a41e8b

After adding batching to the covariance matrix computation and running https://gist.github.com/jberg5/ea8fd54d7ed63894527ea812e9aa98ca to compare performance at different numbers of frames, these are the new results:

  All-atom (3,341 atoms) on x86 c2d-standard-4:

  ┌──────────────┬─────────┬───────┬─────────┬────────────┬───────────┐
  │    Frames    │ Before  │ After │ Speedup │ Mem Before │ Mem After │
  ├──────────────┼─────────┼───────┼─────────┼────────────┼───────────┤
  │ 4,187 (1x)   │ 849s    │ 78s   │ 10.9x   │ 3,067 MB   │ 1,572 MB  │
  ├──────────────┼─────────┼───────┼─────────┼────────────┼───────────┤
  │ 8,374 (2x)   │ 1,437s  │ 85s   │ 16.9x   │ 3,067 MB   │ 1,572 MB  │
  ├──────────────┼─────────┼───────┼─────────┼────────────┼───────────┤
  │ 20,935 (5x)  │ 3,190s  │ 105s  │ 30.4x   │ 3,068 MB   │ 1,573 MB  │
  ├──────────────┼─────────┼───────┼─────────┼────────────┼───────────┤
  │ 41,870 (10x) │ skipped │ 137s  │ —       │ —          │ 1,574 MB  │
  └──────────────┴─────────┴───────┴─────────┴────────────┴───────────┘

  Backbone (855 atoms):

  ┌──────────────┬────────┬───────┬─────────┬────────────┬───────────┐
  │    Frames    │ Before │ After │ Speedup │ Mem Before │ Mem After │
  ├──────────────┼────────┼───────┼─────────┼────────────┼───────────┤
  │ 4,187 (1x)   │ 32s    │ 2.1s  │ 15.2x   │ 201 MB     │ 110 MB    │
  ├──────────────┼────────┼───────┼─────────┼────────────┼───────────┤
  │ 8,374 (2x)   │ 53s    │ 2.9s  │ 18.3x   │ 201 MB     │ 111 MB    │
  ├──────────────┼────────┼───────┼─────────┼────────────┼───────────┤
  │ 20,935 (5x)  │ 125s   │ 5.2s  │ 24.0x   │ 202 MB     │ 112 MB    │
  ├──────────────┼────────┼───────┼─────────┼────────────┼───────────┤
  │ 41,870 (10x) │ 243s   │ 9.0s  │ 27.0x   │ 204 MB     │ 113 MB    │
  └──────────────┴────────┴───────┴─────────┴────────────┴───────────┘

LLM / AI generated code disclosure

LLMs or other AI-powered tools (beyond simple IDE use cases) were used in this contribution: yes

First pass of this was done by Claude (with opus 4.6). I've modified it a little bit to remove some unnecessary changes, improve variable names, etc. I also used Claude extensively to write benchmarking scripts.

PR Checklist

Issue raised/referenced?
Tests updated/added? No - I think existing coverage is pretty good already, and this is a performance improvement
Documentation updated/added?
package/CHANGELOG file updated?
Is your name in package/AUTHORS? (If it is not, add it!)
LLM/AI disclosure was updated.

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

📚 Documentation preview 📚: https://mdanalysis--5352.org.readthedocs.build/en/5352/

codecov · 2026-03-29T13:20:38Z

Codecov Report

❌ Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.83%. Comparing base (c32fe45) to head (44f3340).
⚠️ Report is 4 commits behind head on develop.

Files with missing lines	Patch %	Lines
package/MDAnalysis/analysis/pca.py	85.71%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #5352      +/-   ##
===========================================
- Coverage    93.84%   93.83%   -0.02%     
===========================================
  Files          182      182              
  Lines        22492    22502      +10     
  Branches      3199     3201       +2     
===========================================
+ Hits         21107    21114       +7     
- Misses         923      925       +2     
- Partials       462      463       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

package/MDAnalysis/analysis/pca.py

BradyAJohnston

I'm going to block this until we have some core dev discussion about it.

Our AI Policy is currently quite restrictive on AI usage. It's this way to help us deal with the wave of slop PRs that we get, and would currently exclude this PR.

Given the performance improvements, limited scope & demonstrated understanding I would argue this would be worthy of an exception but that is different discussion that we need to have.

jberg5 · 2026-03-30T00:55:36Z

@BradyAJohnston thanks for the context, totally understand. Just to give a little more background on me and how I have been using Claude: I work in quant finance, where I've picked up some practical experience with using the scientific computing stack to try to pull signal out of noise. My thesis with Claude is that if I use it as a translator, it can help me map my experience onto different domains (perhaps with a little more benefit to society than my day job lol). Turns out PCA is good for more than just looking for factors driving equity prices !!!

The workflow I'm using looks something like 0) clone a cool project like MDAnalysis 1) find or write suitable benchmarks of realistic workflows 2) tell Claude "here are some patterns I want you to look for" (python loops that can be vectorized, linear algebra operations that don't fully exploit the problem structure, etc), and then 3) sift through the slop to find legit improvements.

I try really hard to stick to small diffs that I can completely stand behind. For this particular PR I ended up rewriting most of what Claude proposed anyway, and I think the end result is pretty much identical to what I would have done by hand, the only difference is that AI helped me find the opportunity and iterate much more quickly than I otherwise would have.

Anyway, I know that there's a lot of garbage out there these days, and it's gotta be a lot of work to wade through it, so I'll respect whatever decision you make. Really appreciate your time!

orbeckst · 2026-03-31T01:05:18Z

package/MDAnalysis/analysis/pca.py

        else:
            x = self._atoms.positions.ravel()
        x -= self._xmean
-        self.cov += np.dot(x[:, np.newaxis], x[:, np.newaxis].T)


One reason for building the covariance matrix incrementally is that it ensures bounds on memory usage. In your approach, memory consumption grows linearly with trajectory length because you're copying coordinates into memory and thus a longer trajectory will eventually fill up the memory. The test trajectory is very short. Real trajectories are easily 1000-100,000 times longer.

If there's really an advantage to multiplying huge matrices then perhaps a batched approach is a good middle ground.

Some manual benchmarking:

import numpy as np import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data u = mda.Universe(data.PSF, data.DCD) x = u.atoms.positions.ravel()

shows that replacing the np.dot with transposes with np.outer will help already

>>> %timeit np.dot(x[:, np.newaxis], x[:, np.newaxis].T) 61.4 ms ± 466 μs per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> %timeit np.outer(x, x) 35.4 ms ± 249 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

However, the addition update is actually slow

%%timeit cov = np.zeros((len(x), len(x))); xx = np.outer(x,x) ...: cov += xx ...: 97.5 ms ± 3.08 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

so it makes sense to do the addition as part of matrix multiplication.

Batching is probably the way to go (but that makes streaming more difficult).

Good call about batching! And thanks for the context on realistic problem sizes, this is where my complete inexperience in this field really shows 🥲

I did some parameter tuning and settled on a batch size of 500 frames, this seems to capture most of the original speedup while keeping peak memory below what it used to be.

To benchmark additional frames, I basically just tiled the adk data together (not going for realism here, just size haha). You can see the script I used here: https://gist.github.com/jberg5/ea8fd54d7ed63894527ea812e9aa98ca

Results comparing before (what's on main) with my current branch:

All-atom (3,341 atoms) on x86 c2d-standard-4: ┌──────────────┬─────────┬───────┬─────────┬────────────┬───────────┐ │ Frames │ Before │ After │ Speedup │ Mem Before │ Mem After │ ├──────────────┼─────────┼───────┼─────────┼────────────┼───────────┤ │ 4,187 (1x) │ 849s │ 78s │ 10.9x │ 3,067 MB │ 1,572 MB │ ├──────────────┼─────────┼───────┼─────────┼────────────┼───────────┤ │ 8,374 (2x) │ 1,437s │ 85s │ 16.9x │ 3,067 MB │ 1,572 MB │ ├──────────────┼─────────┼───────┼─────────┼────────────┼───────────┤ │ 20,935 (5x) │ 3,190s │ 105s │ 30.4x │ 3,068 MB │ 1,573 MB │ ├──────────────┼─────────┼───────┼─────────┼────────────┼───────────┤ │ 41,870 (10x) │ skipped │ 137s │ — │ — │ 1,574 MB │ └──────────────┴─────────┴───────┴─────────┴────────────┴───────────┘ Backbone (855 atoms): ┌──────────────┬────────┬───────┬─────────┬────────────┬───────────┐ │ Frames │ Before │ After │ Speedup │ Mem Before │ Mem After │ ├──────────────┼────────┼───────┼─────────┼────────────┼───────────┤ │ 4,187 (1x) │ 32s │ 2.1s │ 15.2x │ 201 MB │ 110 MB │ ├──────────────┼────────┼───────┼─────────┼────────────┼───────────┤ │ 8,374 (2x) │ 53s │ 2.9s │ 18.3x │ 201 MB │ 111 MB │ ├──────────────┼────────┼───────┼─────────┼────────────┼───────────┤ │ 20,935 (5x) │ 125s │ 5.2s │ 24.0x │ 202 MB │ 112 MB │ ├──────────────┼────────┼───────┼─────────┼────────────┼───────────┤ │ 41,870 (10x) │ 243s │ 9.0s │ 27.0x │ 204 MB │ 113 MB │ └──────────────┴────────┴───────┴─────────┴────────────┴───────────┘

I'm pretty excited by how the magnitude of the speedup scales with more frames.

And for completeness here's the story if you just isolate calcium:

┌──────────────┬──────────┬─────────┬─────────┬────────────┬───────────┐ │ Frames │ Baseline │ Patched │ Speedup │ Mem Before │ Mem After │ ├──────────────┼──────────┼─────────┼─────────┼────────────┼───────────┤ │ 4,187 (1x) │ 1.4s │ 0.4s │ 3.5x │ 13 MB │ 9 MB │ ├──────────────┼──────────┼─────────┼─────────┼────────────┼───────────┤ │ 41,870 (10x) │ 12.6s │ 3.2s │ 3.9x │ 15 MB │ 11 MB │ └──────────────┴──────────┴─────────┴─────────┴────────────┴───────────┘

Not quite as much juice to squeeze here but still an improvement.

jberg5 added 2 commits March 29, 2026 10:55

Compute covariance matrix once and use eigh for significant speedup

29e0e21

docs

372e8d1

jberg5 commented Mar 29, 2026

View reviewed changes

package/MDAnalysis/analysis/pca.py Show resolved Hide resolved

BradyAJohnston requested changes Mar 30, 2026

View reviewed changes

orbeckst reviewed Mar 31, 2026

View reviewed changes

jberg5 added 2 commits March 31, 2026 11:52

Batched matmul to limit peak mem

dfaa79f

format and tweak comment

44f3340

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: significant speedup and memory usage improvement for PCA().run()#5352

Performance: significant speedup and memory usage improvement for PCA().run()#5352
jberg5 wants to merge 4 commits intoMDAnalysis:developfrom
jberg5:pca-perf

jberg5 commented Mar 29, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

BradyAJohnston left a comment

Uh oh!

jberg5 commented Mar 30, 2026

Uh oh!

orbeckst Mar 31, 2026

Uh oh!

orbeckst Mar 31, 2026

Uh oh!

jberg5 Mar 31, 2026 •

edited

Loading

Uh oh!

jberg5 Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jberg5 commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LLM / AI generated code disclosure

PR Checklist

Developers Certificate of Origin

Uh oh!

codecov bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

BradyAJohnston left a comment

Choose a reason for hiding this comment

Uh oh!

jberg5 commented Mar 30, 2026

Uh oh!

orbeckst Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

orbeckst Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

jberg5 Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jberg5 Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jberg5 commented Mar 29, 2026 •

edited

Loading

codecov bot commented Mar 29, 2026 •

edited

Loading

jberg5 Mar 31, 2026 •

edited

Loading

jberg5 Mar 31, 2026 •

edited

Loading