Skip to content

Improve request-level data collection when only the metrics are needed. #522

@azamikram

Description

@azamikram

What would you like to be added:

A new flag, such as metrics_only, designed to work in conjunction with per_request. When enabled, this flag would restrict data collection exclusively to computable performance metrics, such as TTFT, TPOT, and ITLs.

Why is this needed:

When running inference-perf with high QPS and the per_request flag enabled, generating and storing the final per_request_lifecycle_metrics.json file takes a significant amount of time. This delay occurs because the log captures raw request and response content for every individual request, and hence making disk I/O a major bottleneck.

In high-QPS cases, this file can grow to several gigabytes, delaying report generation times to tens of minutes. This overhead is unnecessary for users who do not need the raw request/response payloads and are only interested in analyzing the per-request timing metrics (TTFT, TPOT, and ITL).

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions