You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be great if Alloy provided an option (enabled by default may be good idea) for logging prometheus.scrape errors for targets.
This could lead to a lot of logs, so some level of rate limiting may be necessary. But often users see scrapes failing and don't see the reason for them.
Use case
Finding scrape failure reason.
The failure reason can currently be discovered in the UI, but the experience of this is poor, especially in a large cluster. The UI can be improved, but seems like logging is still a good idea, even if UI would have a nice tool for finding target status in the cluster.
The text was updated successfully, but these errors were encountered:
Please correct me if I am wrong but after a brief look into what it would take to add these kind of logs this is what I have found so far:
Every scrape job is handled by a scrape.Manager that is coming from Prometheus project. This one runs the scrap loops and internally record scrape errors.
Run a job that periodically check for any recorded errors on scrape targets and logs them. Not sure this would be the best solutions because it would add more read locks to targets
Reuse the mechanism that is already there to log scrape errors. We don't have to supply a logger that logs to file here but we need to set ScrapeFailureLogFile to something for it to work. Not sure how you would combine both settings i.e. log scrape errors to std out and file if configured.
I think this should should also be possible to setup without any code changes with
prometheus.scrape "scrape" {
// ... other config
scrape_failure_log_file = "some-file.log"
}
local.file_match "local_files" {
path_targets = [{"__path__" = "some-file.log"}]
sync_period = "5s"
}
// other config to to relabel / push logs
Request
It would be great if Alloy provided an option (enabled by default may be good idea) for logging prometheus.scrape errors for targets.
This could lead to a lot of logs, so some level of rate limiting may be necessary. But often users see scrapes failing and don't see the reason for them.
Use case
Finding scrape failure reason.
The failure reason can currently be discovered in the UI, but the experience of this is poor, especially in a large cluster. The UI can be improved, but seems like logging is still a good idea, even if UI would have a nice tool for finding target status in the cluster.
The text was updated successfully, but these errors were encountered: