Skip to content

Client breaks after user-code based timeout #3168

@MilheiroSantos

Description

@MilheiroSantos

Description

Our caching strategy is the following: Try to get the value from cache. After a prescribed timeout, query the database directly. We've noticed in prod, that after a single key timeout, all subsequent keys will timeout too.
I managed to recreate the issue by creating a poorly-behaved proxy that silently drops a specific key, but passes on all other keys. I don't claim the root issue is coming from a poorly-behaved proxy behaving the exact same way as I coded, just that it perfectly replicates our issue in prod.

Here's the proxy: proxy.js
And here's the example that replicates our issue: client.js

How to run the test-case:

docker run docker run -p 6379:6379 redis:8.4.0-bookworm
# in a new shell
node proxy.js
# in a new shell
node client.js

The weird behavior:
After the first timeout, all subsequent requests will time-out. The redis-server receives and replies to the subsequent requests, but the redis-client doesn't seem to return the response.

What didn't work:
This is the initial timeout logic that breaks after the first timeout.

async function getWithTimeout(key, defaultValue, timeoutMs = 2000) {
  try {
    const timeoutPromise = new Promise((_, reject) =>
      setTimeout(() => reject(new Error('Redis operation timeout')), timeoutMs)
    );
    const getPromise = client.get(key);
    const result = await Promise.race([getPromise, timeoutPromise]);
    return result !== null ? result : defaultValue;
  } catch (err) {
    console.error(`Error fetching key "${key}":`, err.message);
    return "timeout"
  }
}

We also tried the abortControler method, but it just hangs in the dropped key:

async function getWithTimeout(key, defaultValue, timeoutMs = 2000) {
  const ac = new AbortController();
  const t = setTimeout(() => ac.abort(), 1000);
  try {
    const result = await client.withCommandOptions({abortSignal: ac.signal}).get(key);
    return result !== null ? result : defaultValue;
  } catch (err) {
    console.error(`Error fetching key "${key}":`, err.message);
    return "timeout"
  } finally {
    clearTimeout(t);
  }
}

What kinda worked:
Setting clientConfig.socker.socketTimeout: 2000 force-disconnects the socket after 2s, but all subsequent requests fail with The client is closed errors. Reconnecting the client is then better with the next method.

What works, but it doesn't feel right:
If there's a timeout, the retry logic will disconnect from the server, and open a new connection.

async function getWithTimeout(key, defaultValue, timeoutMs = 2000) {
  try {
    const result = await client.withCommandOptions({abortSignal: ac.signal}).get(key);
    return result !== null ? result : defaultValue;
  } catch (err) {
    console.error(`Error fetching key "${key}":`, err.message);
    try {
      // HERE
      client.destroy()
      await client.connect()
    }
    catch (e) { }
    return "timeout"
  }
}

While this solves our immediate issue, it feels like we're throwing the baby out with the bathwater.

This issue feels like it should be handled at the library level, but it may as well be expected behavior. In that case, perhaps the documentation could be updated to clarify this gotcha.

Node.js Version

v22.17.1

Redis Server Version

8.4.0

Node Redis Version

5.10.0

Platform

Linux

Logs

In client.js:

User data: 0: default value 0
User data: 1: default value 1
User data: 2: default value 2
User data: 3: default value 3
User data: 4: default value 4
User data: 5: default value 5
User data: 6: default value 6
User data: 7: default value 7
User data: 8: default value 8
User data: 9: default value 9
Error fetching key "key:10": Redis operation timeout
User data: 10: timeout
Error fetching key "key:11": Redis operation timeout
User data: 11: timeout
Error fetching key "key:12": Redis operation timeout
User data: 12: timeout
Error fetching key "key:13": Redis operation timeout
User data: 13: timeout
Error fetching key "key:14": Redis operation timeout
User data: 14: timeout
Error fetching key "key:15": Redis operation timeout
User data: 15: timeout
Error fetching key "key:16": Redis operation timeout
User data: 16: timeout
Error fetching key "key:17": Redis operation timeout
User data: 17: timeout
Error fetching key "key:18": Redis operation timeout
User data: 18: timeout
Error fetching key "key:19": Redis operation timeout
User data: 19: timeout


In the proxy.js:
[ 'CLIENT', 'SETINFO', 'LIB-VER', '5.10.0' ]
[ 'GET', 'key:0' ]
[ 'GET', 'key:1' ]
[ 'GET', 'key:2' ]
[ 'GET', 'key:3' ]
[ 'GET', 'key:4' ]
[ 'GET', 'key:5' ]
[ 'GET', 'key:6' ]
[ 'GET', 'key:7' ]
[ 'GET', 'key:8' ]
[ 'GET', 'key:9' ]
Blocking GET key:10
[ 'GET', 'key:11' ]
[ 'GET', 'key:12' ]
[ 'GET', 'key:13' ]
[ 'GET', 'key:14' ]
[ 'GET', 'key:15' ]
[ 'GET', 'key:16' ]
[ 'GET', 'key:17' ]
[ 'GET', 'key:18' ]
[ 'GET', 'key:19' ]
[ 'QUIT' ]

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions