Skip to content

Commit c55e645

Browse files
committed
Generalize BaseStatistics code a bit + document it
Signed-off-by: Eero Tamminen <[email protected]>
1 parent 6419ace commit c55e645

File tree

2 files changed

+38
-46
lines changed

2 files changed

+38
-46
lines changed

comps/cores/mega/base_statistics.py

Lines changed: 18 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -21,47 +21,23 @@ def append_latency(self, latency, first_token_latency=None):
2121
if first_token_latency:
2222
self.first_token_latencies.append(first_token_latency)
2323

24-
def calculate_statistics(self):
25-
if not self.response_times:
26-
return {
27-
"p50_latency": None,
28-
"p99_latency": None,
29-
"average_latency": None,
30-
}
31-
# Calculate the P50 (median)
32-
p50 = np.percentile(self.response_times, 50)
33-
34-
# Calculate the P99
35-
p99 = np.percentile(self.response_times, 99)
36-
37-
avg = np.average(self.response_times)
38-
39-
return {
40-
"p50_latency": p50,
41-
"p99_latency": p99,
42-
"average_latency": avg,
43-
}
44-
45-
def calculate_first_token_statistics(self):
46-
if not self.first_token_latencies:
47-
return {
48-
"p50_latency_first_token": None,
49-
"p99_latency_first_token": None,
50-
"average_latency_first_token": None,
51-
}
52-
# Calculate the P50 (median)
53-
p50 = np.percentile(self.first_token_latencies, 50)
54-
55-
# Calculate the P99
56-
p99 = np.percentile(self.first_token_latencies, 99)
57-
58-
avg = np.average(self.first_token_latencies)
59-
60-
return {
61-
"p50_latency_first_token": p50,
62-
"p99_latency_first_token": p99,
63-
"average_latency_first_token": avg,
64-
}
24+
def _add_statistics(self, result, stats, suffix):
25+
"add P50 (median), P99 and average values for 'stats' array to 'result' dict"
26+
if stats:
27+
result[f"p50_{suffix}"] = np.percentile(stats, 50)
28+
result[f"p99_{suffix}"] = np.percentile(stats, 99)
29+
result[f"average_{suffix}"] = np.average(stats)
30+
else:
31+
result[f"p50_{suffix}"] = None
32+
result[f"p99_{suffix}"] = None
33+
result[f"average_{suffix}"] = None
34+
35+
def get_statistics(self):
36+
"return stats dict with P50, P99 and average values for first token and response timings"
37+
result = {}
38+
self._add_statistics(result, self.response_times, "latency")
39+
self._add_statistics(result, self.first_token_latencies, "latency_first_token")
40+
return result
6541

6642

6743
def register_statistics(
@@ -79,7 +55,5 @@ def collect_all_statistics():
7955
results = {}
8056
if statistics_dict:
8157
for name, statistic in statistics_dict.items():
82-
tmp_dict = statistic.calculate_statistics()
83-
tmp_dict.update(statistic.calculate_first_token_statistics())
84-
results.update({name: tmp_dict})
58+
results[name] = statistic.get_statistics()
8559
return results

comps/cores/telemetry/README.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,19 @@ OPEA Comps currently provides telemetry functionalities for metrics and tracing
44

55
![opea telemetry](https://raw.githubusercontent.com/Spycsh/assets/main/OPEA%20Telemetry.jpg)
66

7+
Contents:
8+
9+
- [Metrics](#metrics)
10+
- [HTTP metrics](#http-metrics)
11+
- [Megaservice E2E metrics](#megaservice-e2e-metrics)
12+
- [Inferencing metrics](#inferencing-metrics)
13+
- [Metrics collection](#metrics-collection)
14+
- [Statistics](#statistics)
15+
- [Tracing](#tracing)
16+
- [Visualization](#visualization)
17+
- [Visualize metrics](#visualize-metrics)
18+
- [Visualize tracing](#visualize-tracing)
19+
720
## Metrics
821

922
OPEA microservice metrics are exported in Prometheus format under `/metrics` endpoint.
@@ -20,7 +33,7 @@ They can be fetched e.g. with `curl`:
2033
curl localhost:{port of your service}/metrics
2134
```
2235

23-
### HTTP Metrics
36+
### HTTP metrics
2437

2538
Metrics output looks following:
2639

@@ -54,7 +67,7 @@ Latency ones are histogram metrics i.e. include count, total value and set of va
5467

5568
They are available only for _streaming_ requests using LLM. Pending count accounts for all requests.
5669

57-
### Inferencing Metrics
70+
### Inferencing metrics
5871

5972
For example, you can `curl localhost:6006/metrics` to retrieve the TEI embedding metrics, and the output should look like follows:
6073

@@ -95,6 +108,11 @@ Below are some default metrics endpoints for specific microservices:
95108
| TEI embedding | 6006 | /metrics | [link](https://huggingface.github.io/text-embeddings-inference/#/Text%20Embeddings%20Inference/metrics) |
96109
| TEI reranking | 8808 | /metrics | [link](https://huggingface.github.io/text-embeddings-inference/#/Text%20Embeddings%20Inference/metrics) |
97110

111+
## Statistics
112+
113+
Additionally, GenAIComps microservices provide separate `/v1/statistics` endpoint, which outputs P50, P99 and average metrics
114+
for response times, and first token latencies, if microservice processes them.
115+
98116
## Tracing
99117

100118
OPEA use OpenTelemetry to trace function call stacks. To trace a function, add the `@opea_telemetry` decorator to either an async or sync function. The call stacks and time span data will be exported by OpenTelemetry. You can use Jaeger UI to visualize this tracing data.

0 commit comments

Comments
 (0)