Skip to content

RNG state synchronization issue across worker processes #509

@yangligt2

Description

@yangligt2

What happened:
When benchmarking with the random data generator and num_workers > 0, all generated requests contain identical text content. This occurs even when multiple worker processes are used, as each worker generates the same sequence of "random" tokens. In the specific case where all input lengths are the same (e.g., std_dev: 0.0), every single request across all workers becomes a bitwise duplicate of the others.

What you expected to happen:
Each request should contain unique random text content. Using multiple workers should parallelize the generation of different random sequences, rather than duplicating the same sequence in parallel.

How to reproduce it (as minimally and precisely as possible):
Run a benchmark with the following data configuration and num_workers set to a value greater than 1:

data:
  type: random
  path: null
  input_distribution:
    min: 8192
    max: 8192
    mean: 8192.0
    std_dev: 0.0
    total_count: 601
    type: normal

Observe the logs or captured prompts to see that every request has the exact same content.

Anything else we need to know?:
The issue stems from how RandomDataGenerator initializes its random number generator (RNG) in inference_perf/datagen/random_datagen.py.

  1. RNG Initialization: The RNG is initialized in the constructor without a specific seed:
    self.rng: np.random.Generator = np.random.default_rng().
  2. Multiprocessing Copy: When num_workers > 0, the RandomDataGenerator object is pickled and copied to each worker process.
  3. Deterministic Generation: Since each worker starts with an identical RNG state snapshot, calls to _generate_exact_length_text (triggered via load_lazy_data) produce the same token sequences across different processes.

This requires a fix to re-seed the RNG within each worker process after the fork/spawn occurs to ensure divergence.

Environment:

  • inference-perf version: [Please provide version]
  • config.yml: [Include the relevant snippet provided above]
  • others: Multiprocessing enabled with num_workers > 0.

Metadata

Metadata

Assignees

Labels

help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions