Skip to content

Adding Rayon support to Rand #566

@pitdicker

Description

@pitdicker

I have made a PR to Rand to add support for Rayon. It is very much in the 'just exploring' phase. It would be great if someone who is more familiar with Rayon could weight in.

We have one basic problem: every PRNG needs to store its state somewhere and mutate it to produce the next random number. Just cloning the state would produce two identical streams of random numbers.

One solution that works right now is to use thread_rng() rust-random/rand#398. It does nothing special (from our perspective), it just sets up one RNG in thread-local storage for every thread in Rayons thread-pool. This may be a costly operation, but only has to be done once during the lifetime of the application. At least if the thread-pool also remains available all that time.

What I am exploring is somehow splitting/reseeding an RNG every time Rayon splits a job. If this a good idea at all, or in no way better than using RNGs in thread-local storage, is something to explore. And so are the details around splitting RNGs and keeping things correct, but that seems like a problem for us in Rand. It would have the advantage that users can choose their favourite, or most appropriate, RNG.

Can someone review the approach, and maybe answer some questions?

Is implementing IndexedParallelIterator, and setting the number of values to generate up front the right choice? I tried working with UnindexedProducer, but that expects to somehow finish by itself, which is something an RNG never does.

Is my abuse of DoubleEndedIterator okay, to implement the necessary traits? It will just produce the next random number like Iterator, it can't produce something like the previous random number. For normal PRNGs that is a computationally expensive operation (and not supported by Rand), and for cryptographic PRNGs impossible.

It is hard to predict what will happen to the statistical quality of PRNGs when they are split off many times, and only very short runs are used. So it seems sensible to set a minimum for the number of items that should be used from the PRNG, to hopefully keep similar statistical properties as one PRNG used continuously. Is setting a hard limit with min_len the best approach? Which Rayon supports increasing by users?

And from the final comment (currently): I wonder if Rayon is designed do deal with situations like ours, where every split of the job is not a basically free operation? So ideally it would split the job as little as possible, instead of splitting it into many manageable chunks. I don't know what drives Rayons decision here...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions