Multimodal datagen: configurable distinct-media pool + cross-request reuse policy

## What

`SyntheticMultimodalDatagenConfig` has no way to express:

1. A bounded pool of N distinct items per modality (image / video / audio).
2. A sampling / reuse policy over that pool across requests.

Today each request renders fresh random bytes per image and per audio (see `MultimodalDataGenerator._build_spec` in `inference_perf/datagen/multimodal_datagen.py` and the `deterministic=False` materialization path in `ChatCompletionAPIData._materialize_multimodal_content` in `inference_perf/apis/chat.py`). Video has a hidden `VideoBytesPool` of 4 per `(w, h, frames)` profile in `inference_perf/mediagen/pool.py`, not user-configurable.

The only cross-request reuse path is `SharedPrefixDataGenerator`, which is prefix-side only and binary at the group level (`prefix_cache_key`).

## Why this matters

Real VLM workloads draw from a finite media corpus and reuse items at non-uniform rates (recently-uploaded asset hit by many users, hot product image, etc.). Without a pool + reuse policy we can't measure server-side multimodal cache behavior, dedup, or encoder-cache reuse under realistic distributions.

A stakeholder hit this directly: there is currently no way to express

```
request 1: <text 1> <image1> <text 2> <image2>
request 2: <text 3> <image2>
```

i.e. `image2` reused across two requests.

## Proposed config shape (sketch, for discussion)

Reuse the existing `Distribution` type for the sampling policy rather than inventing a parallel one. `Distribution` already covers `uniform` / `fixed`; adding `zipf` to `DistributionType` covers the heavy-tail case that real content reuse follows. Explicit weights mirror the existing `WeightedResolution` / `WeightedVideoProfile` pattern.

```yaml
multimodal:
  image:
    count: { ... }              # existing
    resolutions: [ ... ]        # existing
    pool:
      size: 100                 # total distinct images materialized once
      sampling:                 # Distribution over indices [0, size-1]
        type: zipf
        min: 0
        max: 99
        skew: 1.1               # Zipf exponent (reuse skew field)
      # or, for explicit weights:
      # weights: [ { index: 0, weight: 10 }, { index: 1, weight: 1 }, ... ]
  video:
    pool:
      size: 16
      sampling: { type: uniform, min: 0, max: 15 }
  audio:
    pool:
      size: 50
      sampling: { type: zipf, min: 0, max: 49, skew: 0.9 }
```

Omitting `pool` keeps today's behavior: fresh bytes per request.

## Acceptance

- Per-modality `pool.size` honored: at most N distinct rendered blobs.
- Per-modality `pool.sampling` honored across requests.
- Pool is per-process (consistent with `VideoBytesPool` model) so multi-worker loadgen doesn't need IPC.
- `pool` and `prefix_multimodal` interact cleanly (prefix bytes still deterministic per group; payload bytes drawn from the pool).
- Existing `VideoBytesPool` subsumed by the new mechanism (image / audio / video unified) so the hardcoded `pool_size=4` is replaced by `pool.size`.

## Open questions

- `Distribution`'s `mean` / `std_dev` defaults (512 / 200) are meaningless for an index range. Validator requiring `min` / `max` when `Distribution` is used as a pool sampler, or a thin `IndexDistribution` alias that overrides defaults?
- Add `ZIPF` as a new `DistributionType` value, or store the exponent on `skew` only when `type=zipf`?
- Subsume the existing `VideoBytesPool` (cleaner, default `pool.size=4` to preserve behavior) vs. leave it alone and only add new knobs for image / audio (smaller blast radius)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal datagen: configurable distinct-media pool + cross-request reuse policy #498

What

Why this matters

Proposed config shape (sketch, for discussion)

Acceptance

Open questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Multimodal datagen: configurable distinct-media pool + cross-request reuse policy #498

Description

What

Why this matters

Proposed config shape (sketch, for discussion)

Acceptance

Open questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions