Description
What happened:
We are using an instance of the clickhouse plugin in grafana to periodically query a database. The memory profile of the grafana instance is strictly increasing over time. This memory growth can be attributed to the clickhouse plugin, which shows strictly increasing memory when making connections and querying data. This results in an eventual OOM of the process over the course of some number of days.
Collecting heap profiles over time indicates memory used for the database connection increases over time.
(pprof) top
Showing nodes accounting for 518.67MB, 97.88% of 529.91MB total
Dropped 109 nodes (cum <= 2.65MB)
Showing top 10 nodes out of 29
flat flat% sum% cum cum%
255.07MB 48.13% 48.13% 255.57MB 48.23% github.com/ClickHouse/ch-go/compress.NewWriter
247.47MB 46.70% 94.83% 247.47MB 46.70% bufio.NewReaderSize (inline)
16.14MB 3.05% 97.88% 16.14MB 3.05% github.com/ClickHouse/ch-go/proto.(*Buffer).PutString (inline)
0 0% 97.88% 230.87MB 43.57% database/sql.(*DB).PingContext
0 0% 97.88% 230.87MB 43.57% database/sql.(*DB).PingContext.func1
0 0% 97.88% 289.29MB 54.59% database/sql.(*DB).QueryContext
0 0% 97.88% 289.29MB 54.59% database/sql.(*DB).QueryContext.func1
0 0% 97.88% 502.51MB 94.83% database/sql.(*DB).conn
0 0% 97.88% 289.29MB 54.59% database/sql.(*DB).query
0 0% 97.88% 17.64MB 3.33% database/sql.(*DB).queryDC
Taking consecutive goroutine profiles show that the number of connectionOpener processes are also strictly increasing:
(pprof) top
Showing nodes accounting for 941, 99.79% of 943 total
Dropped 104 nodes (cum <= 4)
flat flat% sum% cum cum%
941 99.79% 99.79% 941 99.79% runtime.gopark
0 0% 99.79% 4 0.42% bufio.(*Reader).Read
0 0% 99.79% 923 97.88% database/sql.(*DB).connectionOpener
0 0% 99.79% 5 0.53% internal/poll.(*FD).Read
0 0% 99.79% 7 0.74% internal/poll.(*pollDesc).wait
0 0% 99.79% 7 0.74% internal/poll.(*pollDesc).waitRead (inline)
0 0% 99.79% 7 0.74% internal/poll.runtime_pollWait
0 0% 99.79% 7 0.74% runtime.netpollblock
0 0% 99.79% 932 98.83% runtime.selectgo
What you expected to happen:
The memory usage is stable over time.
Anything else we need to know?:
Some code references are given below:
New datasources are created here
Connect opens a sql db connection here
Within this:
The db is opened via clickhouse-go
Which is ultimately being opened by a connection opener
The ping context also shows continously increasing memory.
Additionally, it looks like connections are created when a new datasource is created, and a new datasource is created if the grafana config is updated.
Environment:
- Grafana version: Grafana v11.3.0
- Plugin version: 4.0.3
- OS Grafana is installed on: Kubernetes (Grafana helm chart)
We also noticed this in plugin version 4.3.2.