Ⅰ. Issue Description
配额扣减逻辑只注册在 `ProcessStreamingResponseBody`,对于 normal(非流式)响应,响应体是否分块取决于网络条件和 Envoy 内部缓冲,无法保证一定触发。
Ⅱ. Describe what happened
在测试非流式请求时, 偶尔配额可以减, 大部分情况下没扣除. ai-quota在response阶段没有任何日志打印.
Ⅲ. Describe what you expected to happen
对比 ai-statistics 同时注册了 ProcessResponseBody作为兜底, 是否考虑在ai-quota里也加上对应逻辑?
Ⅳ. How to reproduce it (as minimally and precisely as possible)
- 开启ai-quota
- 测试几次非stream请求
Ⅰ. Issue Description
Quota deduction logic is only registered in `ProcessStreamingResponseBody`. For normal (non-streaming) responses, whether the response body is chunked depends on network conditions and Envoy internal buffering, and there is no guarantee that it will be triggered.
Ⅱ. Describe what happened
When testing non-streaming requests, the quota can occasionally be reduced, but in most cases it is not deducted. ai-quota does not print any logs during the response phase.
Ⅲ. Describe what you expected to happen
Comparing with ai-statistics, ProcessResponseBody is also registered as a backup. Are you considering adding corresponding logic to ai-quota?
Ⅳ. How to reproduce it (as minimally and precisely as possible)
- Turn on ai-quota
- Test several non-stream requests
Ⅰ. Issue Description
Ⅱ. Describe what happened
Ⅲ. Describe what you expected to happen
对比 ai-statistics 同时注册了 ProcessResponseBody作为兜底, 是否考虑在ai-quota里也加上对应逻辑?
Ⅳ. How to reproduce it (as minimally and precisely as possible)
Ⅰ. Issue Description
Ⅱ. Describe what happened
Ⅲ. Describe what you expected to happen
Comparing with ai-statistics, ProcessResponseBody is also registered as a backup. Are you considering adding corresponding logic to ai-quota?
Ⅳ. How to reproduce it (as minimally and precisely as possible)