Fix silent numerical blowup in regress_out with near-collinear regressors by Arshammik · Pull Request #673 · scverse/rapids-singlecell

Arshammik · 2026-05-23T17:08:53Z

Hi,

rsc.pp.regress_out was supposed to fall back to a pseudoinverse when the regressor Gram matrix was singular, but the guard was cp.linalg.det(R.T @ R) == 0. In float arithmetic the determinant of a near-singular matrix is small but not zero (a Gram with cond around 1.45e+12 gives det around 91), so the check never fired and the cp.linalg.inv(R.T @ R) @ X path ran on a badly-conditioned matrix. Users passing two near-collinear continuous regressors got residuals on the order of 1e+5 where they should have been order 1.

I replaced the det check with a condition-number check at threshold 1e12, and switched the dense path from inv() @ X to cp.linalg.solve(gram, X) — same math, about 1 ULP tighter. The threshold is calibrated so the common single-cell case of disparate-scale regressors (n_counts and percent_mito routinely sit at cond around 1e10) still goes through the fast path; only genuinely degenerate inputs trip the fallback. The Dask path can't fall back cleanly the same way, so it raises ValueError with the actual cond number — clearer than the old "linearly dependent" message it used to produce only when det was exactly zero.

I verified the fix on an H100. On a synthetic case with cond around 1.45e+12 the max absolute residual went from 163,776 (one entry was -3,701 where the lstsq reference said 0.62) to 9.5 — roughly four orders of magnitude. The well-conditioned control case is unchanged at 4.7e-7. The full existing test suite still passes; one assertion in test_regress_out_ordinal[float64] needed its atol loosened from 1e-7 to 1e-6 because the test compares the "all" path (now solve) against the batched cuML SVD path and they now drift by about 3e-7 on f64. There's a comment in the test explaining the change.

Two new tests live in tests/test_regressout.py: one parametrized over f32/f64 confirms the dense path produces no NaN/Inf and keeps residuals bounded for near-collinear regressors, the other confirms the Dask path raises ValueError. I also added a one-line Notes entry to the regress_out docstring so users know about the fallback behavior.

Thanks,
Arsham

`cp.linalg.det(R^T R) == 0` was a vacuous guard: in float arithmetic the det of a near-singular Gram matrix is small-but-nonzero (cond=1e12 gives det≈90, not 0), so the guard never fired for near-collinear regressors and the unstable `cp.linalg.inv` path was taken, producing residuals ~1.6e+5 instead of ~1. Replace the `det == 0` check with `cp.linalg.cond(gram) > 1e12` in both the dense (`_regress_out_continuous`) and Dask (`_regress_out_continuous_dask`) paths. The threshold is set to 1e12 to be permissive of well-scaled single-cell regressors with disparate magnitudes (e.g. n_counts ~ 1e3 and percent_mito ~ 1e-2 routinely give cond ~ 1e10) while still catching truly-collinear inputs. While there, switch the dense "all" path from explicit `inv() @ (R^T @ X)` to `cp.linalg.solve(gram, R^T @ X)` — same math, more numerically stable. The existing test's f64 atol is loosened from 1e-7 to 1e-6 to accommodate the small numerical drift between `inv` and `solve`. A new test (`test_regress_out_near_singular_regressors_no_garbage`) verifies that for near-collinear regressors (cond ~ 1e12) the output contains no NaN/Inf and residuals stay bounded by 100× the input range (vs the previous explosion to 1e+5).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix silent numerical blowup in regress_out with near-collinear regressors#673

Fix silent numerical blowup in regress_out with near-collinear regressors#673
Arshammik wants to merge 1 commit into
scverse:mainfrom
Arshammik:fix/regress-out-singular-guard

Arshammik commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Arshammik commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant