Skip to content

Fixed #1112 Migrate to New NumPy Random Number Generator (RNG) API#1131

Open
seanlaw wants to merge 7 commits intostumpy-dev:mainfrom
seanlaw:migrate_new_numpy_rng
Open

Fixed #1112 Migrate to New NumPy Random Number Generator (RNG) API#1131
seanlaw wants to merge 7 commits intostumpy-dev:mainfrom
seanlaw:migrate_new_numpy_rng

Conversation

@seanlaw
Copy link
Contributor

@seanlaw seanlaw commented Feb 14, 2026

Pull Request Checklist

Below is a simple checklist but please do not hesitate to ask for assistance!

  • Fork, clone, and checkout the newest version of the code
  • Create a new branch
  • Make necessary code changes
  • Install black (i.e., python -m pip install black or conda install -c conda-forge black)
  • Install flake8 (i.e., python -m pip install flake8 or conda install -c conda-forge flake8)
  • Install pytest-cov (i.e., python -m pip install pytest-cov or conda install -c conda-forge pytest-cov)
  • Run black --exclude=".*\.ipynb" --extend-exclude=".venv" --diff ./ in the root stumpy directory
  • Run flake8 --extend-exclude=.venv ./ in the root stumpy directory
  • Run ./setup.sh dev && ./test.sh in the root stumpy directory
  • Reference a Github issue (and create one if one doesn't already exist)

To Do List

  1. Search for np.random
  2. Replace np.random.rand(x) with pytest.RNG.random(x)
  3. Replace np.random.rand(x*y).reshape(x,y) with pytest.RNG.random(size=(x, y))
  4. Replace np.random.uniform(x, y, [z]) with pytest.RNG.uniform(x, y, size=z)
  5. Replace np.random.permutation([x, y, z]) with pytest.RNG.permutation([x, y, z])
  6. Replace np.random.randint(x, y, z) with pytest.RNG.integers(x, y, z)
  7. Replace np.random.choice([x, y], l, replace=True) with pytest.RNG.choice([x, y], l, replace=True)
  8. Replace np.random.normal(size=x) with pytest.RNG.normal(size=x)
  9. Replace all np.random from docstrings, README, tutorials, etc

@gitnotebooks
Copy link

gitnotebooks bot commented Feb 14, 2026

Review these changes at https://app.gitnotebooks.com/stumpy-dev/stumpy/pull/1131

@seanlaw
Copy link
Contributor Author

seanlaw commented Feb 14, 2026

@NimaSarajpoor I've tested the migration on test_core.py as an initial trial run. Would you mind taking a look and providing any feedback before I do the same thing on the other tests?

Copy link
Collaborator

@NimaSarajpoor NimaSarajpoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw
I've left just one comment for your consideration.

@seanlaw
Copy link
Contributor Author

seanlaw commented Feb 15, 2026

@NimaSarajpoor If it's okay with you, I'm going to add ~5 files at a time for you to incrementally review (there are 46 test files in total and I'll need to work on tutorials and other docs too).

@seanlaw
Copy link
Contributor Author

seanlaw commented Feb 16, 2026

Things are starting to get tricky. Previously, np.random.seed(seed) would set the seed for the global random number generator. However, with the new approach, we'll need to synchronize things across to naive.py (i.e., pass the RNG state). Extra care must be taken. Otherwise, things will fall out of sync and cause assertions to fail!

@seanlaw
Copy link
Contributor Author

seanlaw commented Feb 26, 2026

@NimaSarajpoor I added three new files for you to review

# naive.distance(naive.z_norm(T[10:30]), naive.z_norm(T[110:130])) = 0.47
# naive.distance(naive.z_norm(T[10:30]), naive.z_norm(T[210:230])) = 0.24
# naive.distance(naive.z_norm(T[110:130]), naive.z_norm(T[210:230])) = 0.72
# Hence T[10:30] is the motif representative for this motif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these comments should be revised. They show the calculation of z-normalized distances. However, aamp_motifs is for non-normalized case.

Copy link
Contributor Author

@seanlaw seanlaw Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what should it be changed to? I don't remember adding this comment :(

This?

    # naive.distance(T[10:30], T[110:130]) = 0.47
    # naive.distance(T[10:30], T[210:230]) = 0.24
    # naive.distance(T[110:130], T[210:230]) = 0.72
    # Hence T[10:30] is the motif representative for this motif

)
motif_indices_ref = np.array([[19, 77, 63, 52, 71]])
motif_subspaces_ref = [np.array([2])]
motif_distances_ref = np.array([[0.0, 0.10660435, 0.27927693]])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot figure out how these numbers are obtained. Since tests are passing, I guess all is good here. I guess these numbers are revised to match the actual output. So... an off-topic question: are we cheating here by using ref values that are based on the calculation of outputs obtained from performant function ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... an off-topic question: are we cheating here by using ref values that are based on the calculation of outputs obtained from performant function ?

Yes but the answer is nuanced:

  1. In traditional unit testing, nobody implements a "naive" version of the algorithm and, instead, they hardcode a reference answer
  2. The reason why we didn't implement a "naive" version was either because the naive implementation was too slow OR too complicated/not worth it
  3. While we may be cheating (this is true), nonetheless, if somebody breaks aamp_mmotifs (to produce incorrect results) then this test should still fail

Of course, it is completely possible that aamp_mmotifs is currently broken and our hardcoded answer leads to a bad test... Though, I have no reason to believe this.

# * (almost) identical step functions at indices 10, 110 and 210
# * identical linear slopes at indices 70 and 170
T = np.random.normal(size=300)
T = pytest.RNG.normal(size=300)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with other test functions, I was wondering if we should go with random rather than normal distribution.

Copy link
Contributor Author

@seanlaw seanlaw Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't sure if there was a reason to use draw from a normal distribution? If not then I can change it

def test_random_ostinatoed(seed, dask_cluster):
with Client(dask_cluster) as dask_client:
def test_deterministic_ostinato():
pytest.fix_rng_state()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to fix state here? I can see we do not fix state for test_random_ostinato, and that should be OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the test says "deterministic", I was inclined to keep ensure that the inputs would be identical over time. If that weren't the case then it seems like the deterministic tests could be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "deterministic" tests were likely put in place to resemble the stump tests where we have some hardcoded (deterministic) test_data and then, later, we allow random data.


@pytest.mark.parametrize("seed", [41, 88, 290, 292, 310, 328, 538, 556, 563, 570])
def test_deterministic_ostinatoed(seed, dask_cluster):
pytest.fix_rng_state()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to fix state here? I can see we do not fix state for test_random_ostinato.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments above:
#1131 (comment)
#1131 (comment)

@seanlaw
Copy link
Contributor Author

seanlaw commented Feb 27, 2026

I discovered yesterday that this migration will require a lot more work since, unlike the traditional np.random that shares a global state, the new random number generator is only local to, say, a module. In order to share that random state across naive.py, conftest.py, all of the unit tests, and all of the stumpy modules, I'll have to add a global RNG = np.random.default_rng() variable to core.py and then share it across everything.

As a result, I think I'm gonna need to start all over with a new PR and rethink what needs to be done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants