Skip to content

No reconnection with SSL #265

@calvin-nl

Description

@calvin-nl

We've been using Redix with AWS ElastiCache with ssl for qute awhile. However, we recently hit an issue where Redix would not reconnect after disconnecting. In particular, we started getting %Redix.ConnectionError{reason: :closed} on all commands. This continued to happened for around 12 hours while other applications were able to connect to ElastiCache just fine. The issue was resolved after restarting the application. There would have been plenty of opportunities for a reconnect to happen over the 12 hour span.

Digging into Redix a bit:
If it receives an error on the send, it'll close the socket and go into disconnected state: https://github.com/whatyouhide/redix/blob/main/lib/redix/connection.ex#L273-L277

Once in disconnected state, Redix will respond to all commands with :closed, which was what we were seeing: https://github.com/whatyouhide/redix/blob/main/lib/redix/connection.ex#L198-L200

The reconnect logic will happen when disconnect is called: https://github.com/whatyouhide/redix/blob/main/lib/redix/connection.ex#L316-L329

And disconnect is called when the socket_owner sends a stopped message to the connection while the connection is in disconnected state: https://github.com/whatyouhide/redix/blob/main/lib/redix/connection.ex#L206-L216

The socket_owner sends stopped when it receives one of ssl_closed/ssl_error:
https://github.com/whatyouhide/redix/blob/main/lib/redix/socket_owner.ex#L68-L74
https://github.com/whatyouhide/redix/blob/main/lib/redix/socket_owner.ex#L136-L139

So if the socket_owner never receives ssl_closed/ssl_error, it'll never attempt a reconnect.

We created a toy example to see if the client would get a :ssl_closed/:ssl_error message if the server closed the connection, and indeed, the client does not receive such a message. Running the same example with gen_tcp, the client will receive a :tcp_closed message.

Server

defmodule TlsExampleServer do
  use GenServer

  @ip {127, 0, 0, 1}
  @port 54343

  def start_link(_) do
    GenServer.start_link(__MODULE__, %{})
  end

  def init(state) do
    Logger.warning("Started server")
    {:ok, state, {:continue, :listen}}
  end

  def handle_continue(:listen, state) do
    :ssl.start()

    # Create an SSL server socket
    {:ok, listen_socket} = :ssl.listen(@port, [
      certs_keys: [%{
        :keyfile => "key.pem",
        :certfile => "certificate.pem",
        :password => "12345"
      }]
    ])

    # Accept incoming connections
    {:ok, accept_socket} = :ssl.transport_accept(listen_socket)
    {:ok, accept_socket} = :ssl.handshake(accept_socket)

    Logger.warning("Server sending message 1")
    res = :ssl.send(accept_socket, "Hello World")
    Logger.warning("Server send res #{inspect(res)}")

    Logger.warning("Server sleeping")
    Process.sleep(1000)

    Logger.warning("Closing server socket")
    :ssl.close(accept_socket)
    :ssl.close(listen_socket)
    {:stop, :normal, %{}}
  end
end

Client

defmodule TlsExampleClient do
  use GenServer

  @ip {127, 0, 0, 1}
  @port 54343

  def start_link(_) do
    GenServer.start_link(__MODULE__, %{})
  end

  def init(state) do
    Logger.warning("Started client")
    {:ok, state, {:continue, :connect_send}}
  end

  def handle_continue(:connect_send, state) do
    {:ok, connect_socket} = :ssl.connect(@ip, @port, [ verify: :verify_none, active: :once ], :infinity)

    Logger.warning("Client sleeping")
    Process.sleep(3000)
    Logger.warning("Client done sleeping")
    Process.send_after(self(), :close_socket, 1000)

    {:noreply, %{socket: connect_socket}}
  end

  def handle_info(:close_socket, state) do
    Logger.warning("Client closing socket")
    :ssl.close(state.socket)
    {:noreply, state}
  end

  def handle_info({:tcp_error, socket, reason}, state) do
    Logger.warning("Client got tcp error #{inspect(reason)} #{inspect(socket)}")
    {:stop, :normal, %{}}
  end

  def handle_info({:tcp_closed, socket}, _state) do
    Logger.warning("Client got tcp closed #{inspect(socket)}")
    {:stop, :normal, %{}}
  end

  def handle_info({:ssl_closed, socket}, _state) do
    Logger.warning("Client got ssl closed #{inspect(socket)}")
    {:stop, :normal, %{}}
  end

  def handle_info({:ssl_error, socket, reason}, _state) do
    Logger.warning("Client got ssl error #{inspect(reason)} #{inspect(socket)}")
    {:stop, :normal, %{}}
  end

  def handle_info({:ssl, socket, data}, state) do
    Logger.warning("Client got ssl message #{inspect(data)} from socket #{inspect(socket)}")
    {:noreply, state}
  end
end

The messages outputted for ssl:

[warning] Client sleeping {"details":{}}
[warning] Server sending message 1 {"details":{}}
[warning] Server send res :ok {"details":{}}
[warning] Server sleeping {"details":{}}
[warning] Closing server socket {"details":{}}
[warning] Client done sleeping {"details":{}}
[warning] Client got ssl message 'Hello World' from socket {:sslsocket, {:gen_tcp, #Port<0.48>, :tls_connection, :undefined},[#PID<0.1187.0>, #PID<0.1185.0>]} {"details":{}}
[warning] Client closing socket {"details":{}}

We would have expected the client to receive an :ssl_closed message.

The messages outputted when using gen_tcp:

[warning] Client sleeping {"details":{}}
[warning] Server sending message 1 {"details":{}}
[warning] Server send res :ok {"details":{}}
[warning] Server sleeping {"details":{}}
[warning] Closing server socket {"details":{}}
[warning] Client done sleeping {"details":{}}
[warning] Client got tcp message 'Hello World' from socket #Port<0.35> {"details":{}}
[warning] Client got tcp closed #Port<0.35> {"details":{}}

This is on:

elixir 1.14.5-otp-25
erlang 25.3.2.9

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions