-
Notifications
You must be signed in to change notification settings - Fork 94
Improve request-level data collection when only the metrics are needed. #522
Copy link
Copy link
Open
Labels
good first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.Denotes an issue ready for a new contributor, according to the "help wanted" guidelines.help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Metadata
Metadata
Assignees
Labels
good first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.Denotes an issue ready for a new contributor, according to the "help wanted" guidelines.help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Type
Fields
Give feedbackNo fields configured for issues without a type.
What would you like to be added:
A new flag, such as
metrics_only, designed to work in conjunction withper_request. When enabled, this flag would restrict data collection exclusively to computable performance metrics, such as TTFT, TPOT, and ITLs.Why is this needed:
When running inference-perf with high QPS and the
per_requestflag enabled, generating and storing the finalper_request_lifecycle_metrics.jsonfile takes a significant amount of time. This delay occurs because the log captures raw request and response content for every individual request, and hence making disk I/O a major bottleneck.In high-QPS cases, this file can grow to several gigabytes, delaying report generation times to tens of minutes. This overhead is unnecessary for users who do not need the raw request/response payloads and are only interested in analyzing the per-request timing metrics (TTFT, TPOT, and ITL).