|
| 1 | +--- |
| 2 | +title: "SPEC 7 — Seeding pseudo-random number generation" |
| 3 | +number: 7 |
| 4 | +date: 2023-04-19 |
| 5 | +author: |
| 6 | + - "Stéfan van der Walt <[email protected]>" |
| 7 | + - "Sebastian Berg <[email protected]>" |
| 8 | + - "Pamphile Roy <[email protected]>" |
| 9 | + - "Matt Haberland <[email protected]>" |
| 10 | + - Other participants in the discussion <[email protected]>" |
| 11 | +discussion: https://github.com/scipy/scipy/issues/14322 |
| 12 | +endorsed-by: |
| 13 | +--- |
| 14 | + |
| 15 | +## Description |
| 16 | + |
| 17 | +Currently, libraries across the ecosystem provide various APIs for seeding pseudo-random number generation. |
| 18 | +This SPEC suggests a unified, pragmatic API, taking into account technical and historical factors. |
| 19 | +Adopting such a uniform API will simplify the user experience, especially for those who rely on multiple projects. |
| 20 | + |
| 21 | +We recommend: |
| 22 | + |
| 23 | +- standardizing the usage and interpretation of an `rng` keyword for seeding, and |
| 24 | +- avoiding the use of global state and legacy bitstream generators. |
| 25 | + |
| 26 | +We suggest implementing these principles by: |
| 27 | + |
| 28 | +- deprecating uses of an existing seed argument (commonly `random_state` or `seed`) in favor of a consistent `rng` argument, |
| 29 | +- using `numpy.random.default_rng` to validate the `rng` argument and instantiate a `Generator`[^no-RandomState], and |
| 30 | +- deprecating the use of `numpy.random.seed` to control the random state. |
| 31 | + |
| 32 | +We are primarily concerned with API uniformity, but also encourage libraries to move towards using [NumPy pseudo-random `Generator`s](https://numpy.org/doc/stable/reference/random/generator.html) because: |
| 33 | + |
| 34 | +1. `Generator`s avoid problems associated with naïve seeding (e.g., using successive integers), via its [SeedSequence](https://numpy.org/doc/stable/reference/random/parallel.html#seedsequence-spawning) mechanism; |
| 35 | +2. their use avoids relying on global state—which can make code execution harder to track, and may cause problems in parallel processing scenarios. |
| 36 | + |
| 37 | +[^no-RandomState]: |
| 38 | + Note that `numpy.random.default_rng` does not accept instances of `RandomState`, so use of `RandomState` to control the seed is effectively deprecated, too. |
| 39 | + That said, neither `np.random.seed` nor `np.random.RandomState` _themselves_ are deprecated, so they may still be used in some contexts (e.g. by developers for generating unit test data). |
| 40 | + |
| 41 | +### Scope |
| 42 | + |
| 43 | +This is intended as a recommendation to all libraries that allow users to control the state of a NumPy random number generator. |
| 44 | +It is specifically targeted toward functions that currently accept `RandomState` instances via an argument other than `rng`, or allow `numpy.random.seed` to control the random state, but the ideas are more broadly applicable. |
| 45 | +Random number generators other than those provided by NumPy could also be accommodated by an `rng` keyword, but that is beyond the scope of this SPEC. |
| 46 | + |
| 47 | +### Concepts |
| 48 | + |
| 49 | +- `BitGenerator`: Generates a stream of pseudo-random bits. The default generator in NumPy (`numpy.random.default_rng`) uses PCG64. |
| 50 | +- `Generator`: Derives pseudo-random numbers from the bits produced by a `BitGenerator`. |
| 51 | +- `RandomState`: a [legacy object in NumPy](https://numpy.org/doc/stable/reference/random/index.html), similar to `Generator`, that produces random numbers based on the Mersenne Twister. |
| 52 | + |
| 53 | +### Constraints |
| 54 | + |
| 55 | +NumPy, SciPy, scikit-learn, scikit-image, and NetworkX all implement pseudo-random seeding in slightly different ways. |
| 56 | +Common keyword arguments include `random_state` and `seed`. |
| 57 | +In practice, the seed is also often controllable using `numpy.random.seed`. |
| 58 | + |
| 59 | +## Implementation |
| 60 | + |
| 61 | +Legacy behavior in packages such as scikit-learn (`sklearn.utils.check_random_state`) typically handle `None` (use the global seed state), an int (convert to `RandomState`), or `RandomState` object. |
| 62 | + |
| 63 | +Our recommendation here is a deprecation strategy which does not in _all_ cases adhere to the Hinsen principle[^hinsen], |
| 64 | +although it could very nearly do so by enforcing the use of `rng` as a keyword argument. |
| 65 | + |
| 66 | +[^hinsen]: The Hinsen principle states, loosely, that code should, whether executed now or in the future, return the same result, or raise an error. |
| 67 | + |
| 68 | +The [deprecation strategy](https://github.com/scientific-python/specs/pull/180#issuecomment-1515248009) is as follows. |
| 69 | + |
| 70 | +**Initially**, accept both `rng` and the existing `random_state`/`seed`/`...` keyword arguments. |
| 71 | + |
| 72 | +- If both are specified by the user, raise an error. |
| 73 | +- If `rng` is passed by keyword, validate it with `np.random.default_rng()` and use it to generate random numbers as needed. |
| 74 | +- If `random_state`/`seed`/`...` is specified (by keyword or position, if allowed), preserve existing behavior. |
| 75 | + |
| 76 | +**After `rng` becomes available** in all releases within the support window suggested by SPEC 0, emit warnings as follows: |
| 77 | + |
| 78 | +- If neither `rng` nor `random_state`/`seed`/`...` is specified and `np.random.seed` has been used to set the seed, emit a `FutureWarning` about the upcoming change in behavior. |
| 79 | +- If `random_state`/`seed`/`...` is passed by keyword or by position, treat it as before, but: |
| 80 | + |
| 81 | + - Emit a `DeprecationWarning` if passed by keyword, warning about the deprecation of keyword `random_state` in favor of `rng`. |
| 82 | + - Emit a `FutureWarning` if passed by position, warning about the change in behavior of the positional argument. |
| 83 | + |
| 84 | +**After the deprecation period**, accept only `rng`, raising an error if `random_state`/`seed`/`...` is provided. |
| 85 | + |
| 86 | +By now, the function signature, with type annotations, could look like this: |
| 87 | + |
| 88 | +```python |
| 89 | +from collections.abc import Sequence |
| 90 | +import numpy as np |
| 91 | + |
| 92 | + |
| 93 | +SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence |
| 94 | +RNGLike = np.random.Generator | np.random.BitGenerator |
| 95 | + |
| 96 | + |
| 97 | +def my_func(*, rng: RNGLike | SeedLike | None = None): |
| 98 | + """My function summary. |
| 99 | +
|
| 100 | + Parameters |
| 101 | + ---------- |
| 102 | + rng : `numpy.random.Generator`, optional |
| 103 | + Pseudorandom number generator state. When `rng` is None, a new |
| 104 | + `numpy.random.Generator` is created using entropy from the |
| 105 | + operating system. Types other than `numpy.random.Generator` are |
| 106 | + passed to `numpy.random.default_rng` to instantiate a `Generator`. |
| 107 | + """ |
| 108 | + rng = np.random.default_rng(rng) |
| 109 | + |
| 110 | + ... |
| 111 | + |
| 112 | +``` |
| 113 | + |
| 114 | +Also note the suggested language for the `rng` parameter docstring, which encourages the user to pass a `Generator` or `None`, but allows for other types accepted by `numpy.random.default_rng` (captured by the type annotation). |
| 115 | + |
| 116 | +### Impact |
| 117 | + |
| 118 | +There are three classes of users, which will be affected to varying degrees. |
| 119 | + |
| 120 | +1. Those who do not attempt to control the random state. |
| 121 | + Their code will switch from using the unseeded global `RandomState` to using an unseeded `Generator`. |
| 122 | + Since the underlying _distributions_ of pseudo-random numbers will not change, these users should be largely unaffected. |
| 123 | + While _technically_ this change does not adhere to the Hinsen principle, its impact should be minimal. |
| 124 | + |
| 125 | +2. Users of `random_state`/`seed` arguments. |
| 126 | + Support for these arguments will be dropped eventually, but during the deprecation period, we can provide clear guidance, via warnings and documentation, on how to migrate to the new `rng` keyword. |
| 127 | + |
| 128 | +3. Those who use `numpy.random.seed`. |
| 129 | + The proposal will do away with that global seeding mechanism, meaning that code that relies on it would, after the deprecation period, go from being seeded to being unseeded. |
| 130 | + To ensure that this does not go unnoticed, libraries that allowed for control of the random state via `numpy.random.seed` should raise a `FutureWarning` if `np.random.seed` has been called. (See [Code](#code) below for an example.) |
| 131 | + To fully adhere to the Hinsen principle, these warnings should instead be raised as errors. |
| 132 | + In response, users will have to switch from using `numpy.random.seed` to passing the `rng` argument explicitly to all functions that accept it. |
| 133 | + |
| 134 | +### Code |
| 135 | + |
| 136 | +As an example, consider how a SciPy function would transition from a `random_state` parameter to an `rng` parameter using a decorator. |
| 137 | + |
| 138 | +{{< include-code "transition_to_rng.py" "python" >}} |
| 139 | + |
| 140 | +### Core Project Endorsement |
| 141 | + |
| 142 | +Endorsement of this SPEC means that a project intends to: |
| 143 | + |
| 144 | +- standardize the usage and interpretation of an `rng` keyword for seeding, and |
| 145 | +- avoid the use of global state and legacy bitstream generators. |
| 146 | + |
| 147 | +### Ecosystem Adoption |
| 148 | + |
| 149 | +To adopt this SPEC, a project should: |
| 150 | + |
| 151 | +- deprecate the use of `random_state`/`seed` arguments in favor of an `rng` argument in all functions where users need to control pseudo-random number generation, |
| 152 | +- use `numpy.random.default_rng` to validate the `rng` argument and instantiate a `Generator`, and |
| 153 | +- deprecate the use of `numpy.random.seed` to control the random state. |
| 154 | + |
| 155 | +## Notes |
0 commit comments