Closed
Description
server
is one of the protagonists of the outages in the past few days. One thing that we're not satisfied with is that we don't have enough observability about its communication with messagebus.
Following up today's post-mortem meeting, we agreed on a couple of improvements in that area:
- Log an error message when
server
tries to create a topic and fails - Create a new metric
gitpod_server_topic_read_total
(Counter) that increases every timeserver
tries to read from a topic.