-
Notifications
You must be signed in to change notification settings - Fork 94
RNG state synchronization issue across worker processes #509
Copy link
Copy link
Open
Labels
help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Metadata
Metadata
Assignees
Labels
help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Type
Fields
Give feedbackNo fields configured for issues without a type.
What happened:
When benchmarking with the
randomdata generator andnum_workers > 0, all generated requests contain identical text content. This occurs even when multiple worker processes are used, as each worker generates the same sequence of "random" tokens. In the specific case where all input lengths are the same (e.g.,std_dev: 0.0), every single request across all workers becomes a bitwise duplicate of the others.What you expected to happen:
Each request should contain unique random text content. Using multiple workers should parallelize the generation of different random sequences, rather than duplicating the same sequence in parallel.
How to reproduce it (as minimally and precisely as possible):
Run a benchmark with the following data configuration and
num_workersset to a value greater than 1:Observe the logs or captured prompts to see that every request has the exact same content.
Anything else we need to know?:
The issue stems from how
RandomDataGeneratorinitializes its random number generator (RNG) ininference_perf/datagen/random_datagen.py.self.rng: np.random.Generator = np.random.default_rng().num_workers > 0, theRandomDataGeneratorobject is pickled and copied to each worker process._generate_exact_length_text(triggered viaload_lazy_data) produce the same token sequences across different processes.This requires a fix to re-seed the RNG within each worker process after the fork/spawn occurs to ensure divergence.
Environment:
num_workers > 0.