Skip to content

Improve efficiency of max-model-len truncation in conversation replay datagen #511

@achandrasekar

Description

@achandrasekar

What would you like to be added:

Currently the conversation replay datagen's max-model-len truncation logic tokenizes each request once more and truncates specific portions to see if max_model_len is exceeded. This can lead to higher CPU utilization when context is large. Instead, avoid retokenization as much as possible and simplify the logic so that it is both easier to understand and is more efficient.

Why is this needed:

To improve efficiency and readability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions