Improve efficiency of max-model-len truncation in conversation replay datagen

**What would you like to be added**:

Currently the conversation replay datagen's max-model-len truncation logic tokenizes each request once more and truncates specific portions to see if max_model_len is exceeded. This can lead to higher CPU utilization when context is large. Instead, avoid retokenization as much as possible and simplify the logic so that it is both easier to understand and is more efficient.

**Why is this needed**:

To improve efficiency and readability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve efficiency of max-model-len truncation in conversation replay datagen #511

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Improve efficiency of max-model-len truncation in conversation replay datagen #511

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions