Per-request report drops the response body when a streaming request fails on HTTP 200

### What happened

When per-request reporting is enabled (`report.request_lifecycle.per_request: true`), requests that fail while streaming an HTTP **200** response are recorded with an **empty `response` field**, so there is no way to tell *why* they failed from the per-request report. This defeats the main purpose of the per-request escape hatch.

We hit this with a model server that returned 200s but whose responses were marked as failed by inference-perf. Inspecting `per_request_lifecycle_metrics.json` gave no insight into the cause.

### Root cause

In the streaming branch of `OpenAIModelServerClient.process_request` (`inference_perf/client/modelserver/openai_client.py`):

```python
if self.client.api_config.streaming and response.status == 200:
    info = await data.process_response(...)                      # raises mid-stream
    response_content = info.extra_info.get("raw_response", "") if info else ""   # never runs
```

`response_content` is only assigned **after** `process_response` returns. If the stream breaks partway (truncated SSE, dropped connection, or a proxy that 200s then sends an error page), `parse_sse_stream` raises and the raw bytes it had already accumulated are discarded, so `response_content` stays `""`. The non-streaming path does not have this problem because it reads the body (`await response.text()`) before parsing.

### Reproduction

Point inference-perf (streaming completion) at a server that returns `200 OK` with `Content-Length` larger than the bytes actually sent, then closes the connection (triggers `aiohttp.ClientPayloadError` mid-stream). The per-request entry looks like:

```json
{
  "response": "",
  "error": {
    "error_type": "ClientPayloadError",
    "error_msg": "Response payload is not completed: ..."
  }
}
```

The bytes the server actually sent are gone.

### Expected

The per-request report should retain whatever bytes were received before the failure so the failure is diagnosable, e.g.:

```json
{
  "response": "data: {\"choices\":[{\"text\":\"Hello \"}]}\n\ndata: {\"choices\":[{\"text\":\"world \"}]}\n\n",
  "error": { "error_type": "ClientPayloadError", "error_msg": "Response payload is not completed: ..." }
}
```

### Related follow-ups (out of scope for the initial fix)

- A 200 whose SSE body is an in-band error payload (`{"error": ...}` with no `choices`) is silently parsed to empty output and marked **success**, not failed.
- Per-request entries omit `stage_id` and per-request latency (TTFT/ITL/TPOT), which are available at emit time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-request report drops the response body when a streaming request fails on HTTP 200 #531

What happened

Root cause

Reproduction

Expected

Related follow-ups (out of scope for the initial fix)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Per-request report drops the response body when a streaming request fails on HTTP 200 #531

Description

What happened

Root cause

Reproduction

Expected

Related follow-ups (out of scope for the initial fix)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions