Description
Version : Spring Boot 1.4.0.M2 and Spring Boot 1.3.3 Release
There is a memory leak in the Stomp Relay Broker. When client stomp websocket connections have disconnected the broker retains reactor tcp clients.
To replicate the issue the sample application used for our previous reactor memory leak investigation was executed for both Spring Boot 1.4.0.M2 and Spring Boot 1.3.3. The websocket broker was backed by a RabbitMQ instance.
The problem manifests when we disconnect clients as tcp client refs are retained. The following steps were used to replicate :
1. Run server with web socket broker (sample app)
2. Run VisualVM and connect to server
3. Run client test harness (sample app)
4. Wait until 2000 connections in RabbitMQ (shown via RabbitMQ Management)
5. Stop client test harness
6. Watch all client connections disappear from RabbitMQ (via RabbitMQ Management)
7. In VisualVM : Perform GC then Heap Dump
8. In Heap dump click Find (20 Biggest objects)
9. Repeat from step 3
The following shows the visualVM monitor for the above test. The arrows show the points when step 7 above was performed. The upward slating red line was drawn to illustrate the leak i.e. baseline memory heap post GC is increasing.
The following shows the output for the top retained objects in each heap dump (arrows). The first heap dump is before any clients had connected, basically a newly started websocket broker and what we expect all subsequent heap dumps to look like post client run and GC.
The following is a heap dump after 2000 client connection have terminated, no clients are visible in RAbbitMQ and the actual Java client VM has terminated, i.e. no way there could be a websocket client connection to the broker. There is about an 8MB leak in StompBrokerRelayMessageHandler and/or the Reactor TCP client.
The following shows the result after another test run, we keep leaking 8MB for every 2000 websocket connect/disconnects.
It looks like some sort or pool (ArrayList in StompBrokerRelayMessagHandler) is increasing, although this could be related to the issue @smaldini previously looked at.