Skip to content

Fix silent numerical blowup in regress_out with near-collinear regressors#673

Open
Arshammik wants to merge 1 commit into
scverse:mainfrom
Arshammik:fix/regress-out-singular-guard
Open

Fix silent numerical blowup in regress_out with near-collinear regressors#673
Arshammik wants to merge 1 commit into
scverse:mainfrom
Arshammik:fix/regress-out-singular-guard

Conversation

@Arshammik
Copy link
Copy Markdown
Contributor

Hi,

rsc.pp.regress_out was supposed to fall back to a pseudoinverse when the regressor Gram matrix was singular, but the guard was cp.linalg.det(R.T @ R) == 0. In float arithmetic the determinant of a near-singular matrix is small but not zero (a Gram with cond around 1.45e+12 gives det around 91), so the check never fired and the cp.linalg.inv(R.T @ R) @ X path ran on a badly-conditioned matrix. Users passing two near-collinear continuous regressors got residuals on the order of 1e+5 where they should have been order 1.

I replaced the det check with a condition-number check at threshold 1e12, and switched the dense path from inv() @ X to cp.linalg.solve(gram, X) — same math, about 1 ULP tighter. The threshold is calibrated so the common single-cell case of disparate-scale regressors (n_counts and percent_mito routinely sit at cond around 1e10) still goes through the fast path; only genuinely degenerate inputs trip the fallback. The Dask path can't fall back cleanly the same way, so it raises ValueError with the actual cond number — clearer than the old "linearly dependent" message it used to produce only when det was exactly zero.

I verified the fix on an H100. On a synthetic case with cond around 1.45e+12 the max absolute residual went from 163,776 (one entry was -3,701 where the lstsq reference said 0.62) to 9.5 — roughly four orders of magnitude. The well-conditioned control case is unchanged at 4.7e-7. The full existing test suite still passes; one assertion in test_regress_out_ordinal[float64] needed its atol loosened from 1e-7 to 1e-6 because the test compares the "all" path (now solve) against the batched cuML SVD path and they now drift by about 3e-7 on f64. There's a comment in the test explaining the change.

Two new tests live in tests/test_regressout.py: one parametrized over f32/f64 confirms the dense path produces no NaN/Inf and keeps residuals bounded for near-collinear regressors, the other confirms the Dask path raises ValueError. I also added a one-line Notes entry to the regress_out docstring so users know about the fallback behavior.

Thanks,
Arsham

`cp.linalg.det(R^T R) == 0` was a vacuous guard: in float arithmetic the
det of a near-singular Gram matrix is small-but-nonzero (cond=1e12 gives
det≈90, not 0), so the guard never fired for near-collinear regressors
and the unstable `cp.linalg.inv` path was taken, producing residuals
~1.6e+5 instead of ~1.

Replace the `det == 0` check with `cp.linalg.cond(gram) > 1e12` in both
the dense (`_regress_out_continuous`) and Dask (`_regress_out_continuous_dask`)
paths. The threshold is set to 1e12 to be permissive of well-scaled
single-cell regressors with disparate magnitudes (e.g. n_counts ~ 1e3 and
percent_mito ~ 1e-2 routinely give cond ~ 1e10) while still catching
truly-collinear inputs.

While there, switch the dense "all" path from explicit `inv() @ (R^T @ X)`
to `cp.linalg.solve(gram, R^T @ X)` — same math, more numerically stable.
The existing test's f64 atol is loosened from 1e-7 to 1e-6 to accommodate
the small numerical drift between `inv` and `solve`.

A new test (`test_regress_out_near_singular_regressors_no_garbage`)
verifies that for near-collinear regressors (cond ~ 1e12) the output
contains no NaN/Inf and residuals stay bounded by 100× the input range
(vs the previous explosion to 1e+5).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant