Description
Hi all,
I know there's been quite a bit of discussion on this topic already, and a lot of work has gone into improving performance around the TxPool with recent releases. However, we are still seeing the issue, albeit over a longer time period now. Whereas previously we were running OOM after ~3 hours, our Geth node(s) running the recent 1.8.10 release have displayed similar characteristics over a bit longer time period (~15 hours).
This graph shows memory usage. The first downward spike is when we restarted the server after updating to 1.8.10. The most recent downward spike is the process falling over due to OOM.
It might be worth noting that we are running Geth as a data provider for contract data, transaction confirmations, wallet balances, etc. for a DAPP communicating with the Ethereum network. This means we (and our users) are constantly querying the JSON RPC API. Not sure if this helps at all, but it's definitely receiving generous throughput on that side of things. Also worth noting that we have private geth nodes in an isolated environment that receive no such traffic that are not afflicted by the memory leak!
I've attached a dump of our geth logs. As you can see, there appears to be a ton of sleeping/dead goroutines.
geth-logs.txt
Let me know if there's any other information I can provide. Thanks for all the awesome work you guys do!