Skip to content

Commit 15e7048

Browse files
stefanvsebergtupuilagrumdhaber
authored
Add SPEC7: Seeding pseudo-random number generation (#180)
Under discussion at scipy/scipy#14322 --------- Co-authored-by: Sebastian Berg <[email protected]> Co-authored-by: Sebastian Berg <[email protected]> Co-authored-by: Pamphile Roy <[email protected]> Co-authored-by: Lars Grüter <[email protected]> Co-authored-by: Matt Haberland <[email protected]>
1 parent cf09fa0 commit 15e7048

File tree

3 files changed

+483
-0
lines changed

3 files changed

+483
-0
lines changed

spec-0007/index.md

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
---
2+
title: "SPEC 7 — Seeding pseudo-random number generation"
3+
number: 7
4+
date: 2023-04-19
5+
author:
6+
- "Stéfan van der Walt <[email protected]>"
7+
- "Sebastian Berg <[email protected]>"
8+
- "Pamphile Roy <[email protected]>"
9+
- "Matt Haberland <[email protected]>"
10+
- Other participants in the discussion <[email protected]>"
11+
discussion: https://github.com/scipy/scipy/issues/14322
12+
endorsed-by:
13+
---
14+
15+
## Description
16+
17+
Currently, libraries across the ecosystem provide various APIs for seeding pseudo-random number generation.
18+
This SPEC suggests a unified, pragmatic API, taking into account technical and historical factors.
19+
Adopting such a uniform API will simplify the user experience, especially for those who rely on multiple projects.
20+
21+
We recommend:
22+
23+
- standardizing the usage and interpretation of an `rng` keyword for seeding, and
24+
- avoiding the use of global state and legacy bitstream generators.
25+
26+
We suggest implementing these principles by:
27+
28+
- deprecating uses of an existing seed argument (commonly `random_state` or `seed`) in favor of a consistent `rng` argument,
29+
- using `numpy.random.default_rng` to validate the `rng` argument and instantiate a `Generator`[^no-RandomState], and
30+
- deprecating the use of `numpy.random.seed` to control the random state.
31+
32+
We are primarily concerned with API uniformity, but also encourage libraries to move towards using [NumPy pseudo-random `Generator`s](https://numpy.org/doc/stable/reference/random/generator.html) because:
33+
34+
1. `Generator`s avoid problems associated with naïve seeding (e.g., using successive integers), via its [SeedSequence](https://numpy.org/doc/stable/reference/random/parallel.html#seedsequence-spawning) mechanism;
35+
2. their use avoids relying on global state—which can make code execution harder to track, and may cause problems in parallel processing scenarios.
36+
37+
[^no-RandomState]:
38+
Note that `numpy.random.default_rng` does not accept instances of `RandomState`, so use of `RandomState` to control the seed is effectively deprecated, too.
39+
That said, neither `np.random.seed` nor `np.random.RandomState` _themselves_ are deprecated, so they may still be used in some contexts (e.g. by developers for generating unit test data).
40+
41+
### Scope
42+
43+
This is intended as a recommendation to all libraries that allow users to control the state of a NumPy random number generator.
44+
It is specifically targeted toward functions that currently accept `RandomState` instances via an argument other than `rng`, or allow `numpy.random.seed` to control the random state, but the ideas are more broadly applicable.
45+
Random number generators other than those provided by NumPy could also be accommodated by an `rng` keyword, but that is beyond the scope of this SPEC.
46+
47+
### Concepts
48+
49+
- `BitGenerator`: Generates a stream of pseudo-random bits. The default generator in NumPy (`numpy.random.default_rng`) uses PCG64.
50+
- `Generator`: Derives pseudo-random numbers from the bits produced by a `BitGenerator`.
51+
- `RandomState`: a [legacy object in NumPy](https://numpy.org/doc/stable/reference/random/index.html), similar to `Generator`, that produces random numbers based on the Mersenne Twister.
52+
53+
### Constraints
54+
55+
NumPy, SciPy, scikit-learn, scikit-image, and NetworkX all implement pseudo-random seeding in slightly different ways.
56+
Common keyword arguments include `random_state` and `seed`.
57+
In practice, the seed is also often controllable using `numpy.random.seed`.
58+
59+
## Implementation
60+
61+
Legacy behavior in packages such as scikit-learn (`sklearn.utils.check_random_state`) typically handle `None` (use the global seed state), an int (convert to `RandomState`), or `RandomState` object.
62+
63+
Our recommendation here is a deprecation strategy which does not in _all_ cases adhere to the Hinsen principle[^hinsen],
64+
although it could very nearly do so by enforcing the use of `rng` as a keyword argument.
65+
66+
[^hinsen]: The Hinsen principle states, loosely, that code should, whether executed now or in the future, return the same result, or raise an error.
67+
68+
The [deprecation strategy](https://github.com/scientific-python/specs/pull/180#issuecomment-1515248009) is as follows.
69+
70+
**Initially**, accept both `rng` and the existing `random_state`/`seed`/`...` keyword arguments.
71+
72+
- If both are specified by the user, raise an error.
73+
- If `rng` is passed by keyword, validate it with `np.random.default_rng()` and use it to generate random numbers as needed.
74+
- If `random_state`/`seed`/`...` is specified (by keyword or position, if allowed), preserve existing behavior.
75+
76+
**After `rng` becomes available** in all releases within the support window suggested by SPEC 0, emit warnings as follows:
77+
78+
- If neither `rng` nor `random_state`/`seed`/`...` is specified and `np.random.seed` has been used to set the seed, emit a `FutureWarning` about the upcoming change in behavior.
79+
- If `random_state`/`seed`/`...` is passed by keyword or by position, treat it as before, but:
80+
81+
- Emit a `DeprecationWarning` if passed by keyword, warning about the deprecation of keyword `random_state` in favor of `rng`.
82+
- Emit a `FutureWarning` if passed by position, warning about the change in behavior of the positional argument.
83+
84+
**After the deprecation period**, accept only `rng`, raising an error if `random_state`/`seed`/`...` is provided.
85+
86+
By now, the function signature, with type annotations, could look like this:
87+
88+
```python
89+
from collections.abc import Sequence
90+
import numpy as np
91+
92+
93+
SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence
94+
RNGLike = np.random.Generator | np.random.BitGenerator
95+
96+
97+
def my_func(*, rng: RNGLike | SeedLike | None = None):
98+
"""My function summary.
99+
100+
Parameters
101+
----------
102+
rng : `numpy.random.Generator`, optional
103+
Pseudorandom number generator state. When `rng` is None, a new
104+
`numpy.random.Generator` is created using entropy from the
105+
operating system. Types other than `numpy.random.Generator` are
106+
passed to `numpy.random.default_rng` to instantiate a `Generator`.
107+
"""
108+
rng = np.random.default_rng(rng)
109+
110+
...
111+
112+
```
113+
114+
Also note the suggested language for the `rng` parameter docstring, which encourages the user to pass a `Generator` or `None`, but allows for other types accepted by `numpy.random.default_rng` (captured by the type annotation).
115+
116+
### Impact
117+
118+
There are three classes of users, which will be affected to varying degrees.
119+
120+
1. Those who do not attempt to control the random state.
121+
Their code will switch from using the unseeded global `RandomState` to using an unseeded `Generator`.
122+
Since the underlying _distributions_ of pseudo-random numbers will not change, these users should be largely unaffected.
123+
While _technically_ this change does not adhere to the Hinsen principle, its impact should be minimal.
124+
125+
2. Users of `random_state`/`seed` arguments.
126+
Support for these arguments will be dropped eventually, but during the deprecation period, we can provide clear guidance, via warnings and documentation, on how to migrate to the new `rng` keyword.
127+
128+
3. Those who use `numpy.random.seed`.
129+
The proposal will do away with that global seeding mechanism, meaning that code that relies on it would, after the deprecation period, go from being seeded to being unseeded.
130+
To ensure that this does not go unnoticed, libraries that allowed for control of the random state via `numpy.random.seed` should raise a `FutureWarning` if `np.random.seed` has been called. (See [Code](#code) below for an example.)
131+
To fully adhere to the Hinsen principle, these warnings should instead be raised as errors.
132+
In response, users will have to switch from using `numpy.random.seed` to passing the `rng` argument explicitly to all functions that accept it.
133+
134+
### Code
135+
136+
As an example, consider how a SciPy function would transition from a `random_state` parameter to an `rng` parameter using a decorator.
137+
138+
{{< include-code "transition_to_rng.py" "python" >}}
139+
140+
### Core Project Endorsement
141+
142+
Endorsement of this SPEC means that a project intends to:
143+
144+
- standardize the usage and interpretation of an `rng` keyword for seeding, and
145+
- avoid the use of global state and legacy bitstream generators.
146+
147+
### Ecosystem Adoption
148+
149+
To adopt this SPEC, a project should:
150+
151+
- deprecate the use of `random_state`/`seed` arguments in favor of an `rng` argument in all functions where users need to control pseudo-random number generation,
152+
- use `numpy.random.default_rng` to validate the `rng` argument and instantiate a `Generator`, and
153+
- deprecate the use of `numpy.random.seed` to control the random state.
154+
155+
## Notes

spec-0007/test_transition_to_rng.py

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
import contextlib
2+
3+
import numpy as np
4+
import pytest
5+
6+
from transition_to_rng import _transition_to_rng
7+
8+
from scipy._lib._util import check_random_state
9+
10+
11+
@_transition_to_rng("random_state", position_num=1, end_version="1.17.0")
12+
def library_function(arg1, rng=None, arg2=0):
13+
rng = check_random_state(rng)
14+
return arg1, rng.random(), arg2
15+
16+
17+
@contextlib.contextmanager
18+
def np_random_seed(seed=0):
19+
# Save RandomState
20+
rs = np.random.mtrand._rand
21+
22+
# Install temporary RandomState
23+
np.random.mtrand._rand = np.random.RandomState(seed)
24+
25+
yield
26+
27+
np.random.mtrand._rand = rs
28+
29+
30+
def test_positional_random_state():
31+
# doesn't need to warn
32+
library_function(1, np.random.default_rng(2384924)) # Generators still accepted
33+
34+
message = "Positional use of"
35+
if np.random.mtrand._rand._bit_generator._seed_seq is not None:
36+
library_function(1, None) # seed not set
37+
else:
38+
with pytest.warns(FutureWarning, match=message):
39+
library_function(1, None) # seed set
40+
41+
with pytest.warns(FutureWarning, match=message):
42+
library_function(1, 1) # behavior will change
43+
44+
with pytest.warns(FutureWarning, match=message):
45+
library_function(1, np.random.RandomState(1)) # will error
46+
47+
with pytest.warns(FutureWarning, match=message):
48+
library_function(1, np.random) # will error
49+
50+
51+
def test_random_state_deprecated():
52+
message = "keyword argument `random_state` is deprecated"
53+
54+
with pytest.warns(DeprecationWarning, match=message):
55+
library_function(1, random_state=None)
56+
57+
with pytest.warns(DeprecationWarning, match=message):
58+
library_function(1, random_state=1)
59+
60+
61+
def test_rng_correct_usage():
62+
library_function(1, rng=None)
63+
64+
rng = np.random.default_rng(1)
65+
ref_random = rng.random()
66+
67+
res = library_function(1, rng=1)
68+
assert res == (1, ref_random, 0)
69+
70+
rng = np.random.default_rng(1)
71+
res = library_function(1, rng, arg2=2)
72+
assert res == (1, ref_random, 2)
73+
74+
75+
def test_rng_incorrect_usage():
76+
with pytest.raises(TypeError, match="SeedSequence expects"):
77+
library_function(1, rng=np.random.RandomState(123))
78+
79+
with pytest.raises(TypeError, match="multiple values"):
80+
library_function(1, rng=1, random_state=1)
81+
82+
83+
def test_seeded_vs_unseeded():
84+
with np_random_seed():
85+
with pytest.warns(FutureWarning, match="NumPy global RNG"):
86+
library_function(1)
87+
88+
# These tests should still pass when the global seed is set,
89+
# since they provide explicit `random_state` or `rng`
90+
test_positional_random_state()
91+
test_random_state_deprecated()
92+
test_rng_correct_usage()
93+
94+
# Entirely unseeded, should proceed without warning
95+
library_function(1)
96+
97+
98+
def test_decorator_no_positional():
99+
@_transition_to_rng("random_state", end_version="1.17.0")
100+
def library_function(arg1, *, rng=None, arg2=None):
101+
rng = check_random_state(rng)
102+
return arg1, rng.random(), arg2
103+
104+
message = "keyword argument `random_state` is deprecated"
105+
with pytest.warns(DeprecationWarning, match=message):
106+
library_function(1, random_state=3)
107+
108+
library_function(1, rng=123)
109+
110+
111+
def test_decorator_no_end_version():
112+
@_transition_to_rng("random_state")
113+
def library_function(arg1, rng=None, *, arg2=None):
114+
rng = check_random_state(rng)
115+
return arg1, rng.random(), arg2
116+
117+
# no warnings emitted
118+
library_function(1, rng=np.random.default_rng(235498235))
119+
library_function(1, random_state=np.random.default_rng(235498235))
120+
library_function(1, 235498235)
121+
with np_random_seed():
122+
library_function(1, None)
123+
124+
125+
if __name__ == "__main__":
126+
import pytest
127+
128+
pytest.main(["-W", "error"])

0 commit comments

Comments
 (0)