synapse_storage_transaction_time_bucket
prometheus metric has too high cardinality due to desc label #11081
Description
Description
The desc
label on synapse_storage_transaction_time_bucket
results in a very high cardinality metric. For a single instance, there's 248 variants of desc
, multiplied by 15 buckets. This results in over 3k series for a single host.
Though this might be acceptable if you're only ingesting metrics for a single instance, for Prometheus instances that might be scraping multiple Synapses this quickly becomes a problem.
On one of our internal clusters this is resulting in over a million time series, which is about 5x the amount of time series of the next most problematic timeseries: synapse_http_server_response_time_seconds_bucket
. Though this is not causing storage issues, it causes unnecessarily high CPU load and memory usage ballooning on the ingesters (so we have to run with much bigger instances) and querying these series become problematic.
Steps to reproduce
- Run lots of Synapse instances
- Scrape them all with a single Prometheus instance
- Prometheus gets sad
Version information
- Homeserver: any
- Version: any since this metric got introduced
- Install method: unrelated
- Platform: unrelated