Skip to content

[backport] Backport sweep for 9.0#3735

Open
valkeyrie-ops[bot] wants to merge 28 commits into
9.0from
agent/backport/sweep/9.0
Open

[backport] Backport sweep for 9.0#3735
valkeyrie-ops[bot] wants to merge 28 commits into
9.0from
agent/backport/sweep/9.0

Conversation

@valkeyrie-ops
Copy link
Copy Markdown

@valkeyrie-ops valkeyrie-ops Bot commented May 17, 2026

Backport sweep for 9.0

Automated cherry-picks from PRs marked "To be backported".

Applied

Source PR Title Detail
#2846 Dual-channel-replication announces itself at replica-announce-ip if configured already on backport branch
#3283 cluster: fix use-after-free in generateSyncSlotsEstablishCommand already on backport branch
#3257 Fix TSAN compatibility for module loading already on backport branch
#2811 Enhance cluster stale packet detection to prevent sub-replica and empty primary already on backport branch
#3342 Fix ZDIFF algorithm 2 memory leak on early exit already on backport branch
#2872 Strictly check CRLF when parsing querybuf already on backport branch
#3359 Fix incorrect memory overhead calculation for watched keys already on backport branch
#3281 Fix use zfree in cli --eval path already on backport branch
#3209 Fix valkey-cli --cluster del-node for unreachable nodes already on backport branch
#3443 Fix slot-migration-max-failover-repl-bytes unable to accept -1 already on backport branch
#3452 Fix some flaky tests using CLIENT REPLY OFF already on backport branch
#3461 Attempt to deflake 'diskless no replicas drop during rdb pipe' already on backport branch
#3440 Fix config rewrite producing negative values for unsigned memory configs already on backport branch
#3458 Fix race condition during async client freeing with IO threading enabled already on backport branch
#3498 Fix double free in stream consumer PEL loading with corrupt RDB data already on backport branch
#3448 Fixes server crash when RDMA benchmark clients disconnect already on backport branch
#3516 Fix HPERSIST RESP protocol violation on wrong-type key already on backport branch
#3541 Fix FD leak in connSocketBlockingConnect on timeout already on backport branch
#3554 Ensure client slot migration pointer is cleared during reset already on backport branch
#3503 Fix remove cached eval scripts on engine unregister already on backport branch
#3568 Fix GEOSEARCH BYPOLYGON leak on invalid COUNT already on backport branch
#3580 fix(syncio): Set errno on EOF in syncRead and propagate to conn->last… already on backport branch
#3586 fix(cluster): Remove per-call srand in clusterManagerNodePrimaryRandom already on backport branch
#3591 Handle NULL pointer in streamTrim listpack delta calculation already on backport branch
#3596 Fix: prevent NULL dereference crash in connectSlotExportJob when target node disappears already on backport branch
#3578 Fix Deferred Reply Placeholders in Active Deferred Buffers
#3610 Fix SIGSEGV in VM_GetLRU/SetLRU/GetLFU/SetLFU on NULL key conflicts resolved by Claude Code
#3641 Add null check in updateSSLPendingFlag

Needs attention

These candidates could not be applied automatically and need a maintainer to follow up.

4 candidate(s)
Source PR Title Outcome Reason
#3306 Improve COB memory tracking with copy avoidance skipped-conflict target branch lacks conflicted file(s): src/unit/test_networking.cpp
#3548 Fix lua-enable-insecure-api default value cannot be changed to yes skipped-conflict target branch lacks conflicted file(s): src/modules/lua/engine_lua.c
#3743 Fix buffered_reply assert in HFE commands with module keyspace notifications skipped-conflict unresolved — src/t_hash.c: Claude Code modified files outside the conflict set: tests/unit/moduleapi/block_keyspace_notification.tcl
#3800 Fix heap-use-after-free in ACL LOAD when client free is deferred skipped-conflict unresolved — src/t_hash.c: file unchanged after Claude Code (no resolution attempted)

Generated by valkey-ci-agent using Claude Code.

jdheyburn and others added 5 commits May 17, 2026 09:33
…onfigured (#2846)

When dual-channel-replication is enabled, and replica-announce-ip is
set, the RDB/AOF channel does not announce itself at this endpoint. This
defaults to the IP address behind the NAT, or the Kubernetes Pod IP in
our case.

This means that if Sentinel is polling the primary for connected
replicas, it will first see the ephemeral pod IP, then revert to the
announce-ip - leaving behind the pod IP as a down replica.

This PR configures the RDB/AOF channel to also announce itself at the
announce-ip to prevent the stale replica.

## Testing

I evaluated writing unit tests for this, but I am not sure of a way we
can test an IP address different to localhost (127.0.0.1) that would
fail without the fix. I did test on Kubernetes against 9.0 tag and
verified the fix there too.

### Status quo

On 9.0 image tag:

```
$ kubectl get pods -n valkey-baseline -o custom-columns=NAME:.metadata.name,POD-IP:.status.podIP
NAME                              POD-IP
valkey-primary-5bd78c8566-llb6k   10.244.0.25
valkey-replica-0                  10.244.0.17
valkey-replica-1                  10.244.0.13

$ kubectl get services -n valkey-baseline -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.spec.clusterIP
NAME               CLUSTER-IP
valkey-primary     10.96.147.28
valkey-replica-0   10.96.66.233
valkey-replica-1   10.96.57.230
```

Logs below show that pod IP for valkey-primary-5bd78c8566-llb6k
`10.244.0.25:6379` is being used for dual-channel replication. This
should be its cluster IP `10.96.147.28` as this is what is set in
replica-announce-ip.

```
1:M 14 Nov 2025 17:57:51.750 * Replica 10.96.147.28:6379 asks for synchronization
1:M 14 Nov 2025 17:57:51.751 * Replica 10.244.0.25:6379 asks for synchronization
1:M 14 Nov 2025 17:57:56.135 * Dual channel replication: Sending to replica 10.244.0.25:6379 RDB end offset 1763269 and client-id 35
1:M 14 Nov 2025 17:57:56.140 * Replica 10.96.147.28:6379 asks for synchronization
```

### This fix

```
$ kubectl get pods -n valkey-test -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.status.podIP  
NAME                              POD-IP
valkey-primary-594c9597b5-qqvdk   10.244.0.26
valkey-replica-0                  10.244.0.10
valkey-replica-1                  10.244.0.18

$ kubectl get services -n valkey-test -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.spec.clusterIP
NAME               CLUSTER-IP
valkey-primary     10.96.125.142
valkey-replica     None
valkey-replica-0   10.96.155.74
valkey-replica-1   10.96.64.111
valkey-sentinel    None
```

Logs show that the Cluster IP is now being used for dual-channel
replication.

```
1:M 14 Nov 2025 17:57:49.923 * Replica 10.96.125.142:6379 asks for synchronization
1:M 14 Nov 2025 17:57:49.924 * Replica 10.96.125.142:6379 asks for synchronization
1:M 14 Nov 2025 17:57:54.913 * Dual channel replication: Sending to replica 10.96.125.142:6379 RDB end offset 1771247 and client-id 36
1:M 14 Nov 2025 17:57:54.916 * Replica 10.96.125.142:6379 asks for synchronization
```

Fixes #2338

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Noticed a use-after-free/double-free bug while load-testing manual slot
migrations. Was able to reliably trigger a process crash.

Original trace:

```
Backtrace:
0   libsystem_platform.dylib            0x00000001917496a4 _sigtramp + 56
1   libsystem_pthread.dylib             0x000000019170f848 pthread_kill + 296
2   libsystem_c.dylib                   0x00000001916189e4 abort + 124
3   libsystem_malloc.dylib              0x000000019151c174 malloc_vreport + 892
4   libsystem_malloc.dylib              0x000000019151fc90 malloc_report + 64
5   libsystem_malloc.dylib              0x000000019152421c ___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
6   valkey-server                       0x0000000104fc2174 _sdsMakeRoomFor + 780
7   valkey-server                       0x0000000104fc30a0 sdscatfmt + 168
8   valkey-server                       0x00000001050318b8 generateSyncSlotsEstablishCommand + 176
9   valkey-server                       0x000000010503156c createSlotImportJob + 324
10  valkey-server                       0x0000000105034fd4 clusterCommandSyncSlots + 1572
11  valkey-server                       0x000000010502f370 clusterCommandSpecial + 2432
12  valkey-server                       0x0000000105021e80 clusterCommand + 2992
13  valkey-server                       0x0000000104fb59a8 call + 320
14  valkey-server                       0x0000000104fcc708 processCommandAndResetClient + 3088
15  valkey-server                       0x0000000104fc9c3c processInputBuffer + 440
16  valkey-server                       0x0000000104fc9198 readQueryFromClient + 108
17  valkey-server                       0x000000010508073c connSocketEventHandler + 136
18  valkey-server                       0x0000000104fa49b4 aeProcessEvents + 316
19  valkey-server                       0x0000000104fc081c main + 18408
20  dyld                                0x000000019136eb98 start + 6076
```

The bug seems to be that the result of `sdscatfmt` is being ignored.
Since `sdscatfmt` potentially reallocs a new buffer when it needs to,
and deallocs the old one, this can cause a use-after-free/double-free if
the loop triggers a buffer-size increase.

This PR fixes the bug.

Signed-off-by: Tony Wooster <twooster@gmail.coM>
Made few changes to support TSAN build in valkey-search repository.
Related: valkey-io/valkey-search#703

Signed-off-by: Baswanth Vegunta <baswanth@amazon.com>
Co-authored-by: Baswanth Vegunta <baswanth@amazon.com>
…ty primary (#2811)

## Summary
This PR handles a network race condition that would cause a replica to read stale cluster
packet and then incorrectly promote itself to an empty primary within an existing shard.

## Issue
"Migrated replica reports zero repl offset and rank, and fails to win election - sigstop"
the test case in `replica-migration.tcl` is a known example where the network race condition
can occur.

Here's the timeline:
- T1: Slot migration — R7 and R3 both try to replicate from R0. Only R3 succeeds, triggering
  a BGSAVE on R0, while R7 blocks in `receiveSynchronousResponse()`.
- T2: While R7 is blocked, R0 is SIGSTOP'd by the test. R4 wins the election and becomes the
  new primary. R4 sends PINGs to R7 - These PINGs land in R7's kernel TCP receive buffer but
  are not read by R7 yet.
- T3 (5s after T1, due to receiveSynchronousResponse): R7 wakes up:
  - It reads from inbound links and finds out the remote nodes have already closed their end
    (they detected R7 as FAIL), getting "I/O error: connection closed" on each. R7 calls `accept()`
    for the new connections that were established during the block. These connections carry data
    that was sent seconds ago while R7 was dead.
  - R7's outbound links are still valid, so it sends PING to R4.
- T4: R7 receives and processes fresh PONG packet from R4 on outbound link, reconfiguring itself
  to follow R4.
- T5: R7 reads stale PING packet of R4 via inbound link. This packet was generated R4 was following
  R0, so R7 incorrectly believes R4 is still following R0 now. And Reconfiguring itself as a replica
  of R0 from R4.
- T6: R7 finds R0 is FAIL, so it starts an election and wins, and becomes an empty-primary.

## Analyze
So in T4, R4 is the new primary, and R7 is reconfiguring itself as a replica. So in R7's view,
R4 is the primary, and myself (R7) is a replica.

And in T5, there is a stale packet from sender (R4), the stale packet is saying: sender (R4) is
a replica and R0 is the primary.

We originally had a logic for stale packet, meaning we would try to ignore stale packet that
would cause exceptions. So in T5:
- sender_claims_to_be_primary is false since R4 is saying it is a replica.
- sender_last_reported_as_primary is true since in R7's view, R4 is a primary.
- sender_claimed_primary is R0, and sender (R4) and sender_claimed_primary (R0) is in the same shard.
- nodeEpoch(sender_claimed_primary) is R0's epoch. R0 is an old and dead (not yet) primary.
- sender_claimed_config_epoch is R0's epoch since R4 is a replica, and R0 is R4's primary.
- nodeEpoch(sender_claimed_primary) == sender_claimed_config_epoch, so the logic fail and we process
  a stale packet. So in this point, the packet is not a stale packet in R4 and R0's view.
- But it is a stale packet in myself (R7) view. In R7's local view, R4 is the new primary and it
  should have a bigger epoch, that is nodeEpoch(sender) should > sender_claimed_config_epoch.

## Fix
The PR fixes the issue by enhancing the existing guardrail logic against stale packet. Previously
that logic only detects `nodeEpoch(sender_claimed_primary) > sender_claimed_config_epoch` as stale
packet, now it also checks `nodeEpoch(sender) > sender_claimed_config_epoch` to make sure we have
up-to-date primary-replica chain.

Signed-off-by: Zhijun <dszhijun@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
`zdiffAlgorithm2()` can break out early once the destination cardinality
reaches zero. In that path, a temporary SDS created by `zuiSdsFromValue()`
was left dirty and never released, because that cleanup normally happens on
the next iterator step.

This patch explicitly discards the dirty value before the early `break`.
```
==11724== 11 bytes in 1 blocks are definitely lost in loss record 36 of 1,442
==11724==    at 0x4846828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==11724==    by 0x2FA974: ztrymalloc_usable_internal (zmalloc.c:156)
==11724==    by 0x2FAAEA: zmalloc_usable (zmalloc.c:200)
==11724==    by 0x282D25: _sdsnewlen (sds.c:102)
==11724==    by 0x2830B2: sdsnewlen (sds.c:169)
==11724==    by 0x2DB537: zuiSdsFromValue (t_zset.c:2290)
==11724==    by 0x2DBE00: zdiffAlgorithm2 (t_zset.c:2502)
==11724==    by 0x2DC085: zdiff (t_zset.c:2568)
==11724==    by 0x2DD01F: zunionInterDiffGenericCommand (t_zset.c:2817)
==11724==    by 0x2DD574: zdiffCommand (t_zset.c:2898)
==11724==    by 0x29F5AE: call (server.c:3883)
==11724==    by 0x2A1598: processCommand (server.c:4569)
```

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthakaggarwal97@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8a517a3a-1863-4ca6-ad96-b4793616c711

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

❌ Provenance Check Alert

Potential code similarities detected with upstream repository.

  • 2026-05-17 09:35:30 [INFO] - matches redis/redis PR #9815 (similarity: 0.913, method: file_simhash+deep); file pairs: tests/unit/type/zset.tcl <- tests/unit/type/zset.tcl

This check was performed automatically by the Provenance Guard Action.

enjoy-binbin and others added 5 commits May 18, 2026 10:28
Currently, when parsing querybuf, we are not checking for CRLF,
instead we assume the last two characters are CRLF by default,
as shown in the following example:
```
telnet 127.0.0.1 6379
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
*3
$3
set
$3
key
$5
value12
+OK
get key
$5
value

*3
$3
set
$3
key
$5
value12345
+OK
-ERR unknown command '345', with args beginning with:
```

This should actually be considered a protocol error. When a bug
occurs in the client-side implementation, we may execute incorrect
requests (writing incorrect data is the most serious of these).

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in #1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
In `evalMode()`, `argv2` is allocated with `zmalloc()` but freed with `free()`.
On builds using jemalloc/tcmalloc, this can cause `free(): invalid pointer`
Closes #3273

Signed-off-by: Su Ko <rhtn1128@gmail.com>
The `valkey-cli --cluster del-node` command fails when attempting to
delete unreachable or failed nodes, reporting `No such node ID` even
though the node exists in the cluster topology.
The root cause is the command only loads information about reachable
nodes, causing the lookup to fail. This PR added a new function for
loading all nodes information to solve this.

1. Loading all nodes from gossip:
- Added `clusterManagerLoadAllInfoFromNode()` that loads both reachable
and unreachable nodes from cluster gossip
- Extracts common logic into `clusterManagerLoadInfoCommon()` with an
`include_unreachable` flag
- Keeps the original `clusterManagerLoadInfoFromNode()` unchanged to
avoid affecting existing callers
2. Added success message to be consistent with other cluster commands:
`[OK] Node <id> removed from the cluster.`
3. Added test coverage for `del-node` which previously had none.
4. Load slot information from gossip for unreachable nodes in
`clusterManagerNodeLoadInfo()`
5. Skip unreachable primaries in `clusterManagerNodeWithLeastReplicas()`

```
./runtest --single unit/cluster/cli

[ok]: del-node: Cannot delete node with slots (9 ms)
[ok]: del-node: Delete reachable node without slots (23 ms)
[ok]: del-node: Delete unreachable node without slots (1333 ms)
[ok]: del-node: Cannot delete unreachable primary with slots (3368 ms)
```
```
valkey-cli --cluster del-node 127.0.0.1:7000 eb837ea7c48908e5304eafd8b1b3ced57147c448

Could not connect to Valkey at 127.0.0.1:7002: Connection refused
>>> Removing node eb837ea7c48908e5304eafd8b1b3ced57147c448 from cluster 127.0.0.1:7000
>>> Sending CLUSTER FORGET messages to the cluster...
>>> WARNING: Could not connect to node 127.0.0.1:7002, unable to send CLUSTER RESET.
[OK] Node eb837ea7c48908e5304eafd8b1b3ced57147c448 removed from the cluster.
```

Before
```
$ valkey-cli --cluster del-node <entry-node-ip>:<entry-node-port> <failed-node-id>
Could not connect to Valkey at <target-node-ip>:<target-node-port>: Connection refused
>>> Removing node <id> from cluster <entry-node-ip>:<entry-node-port>
[ERR] No such node ID <id>
```
After
```
$ valkey-cli --cluster del-node <entry-node-ip>:<entry-node-port> <failed-node-id>
Could not connect to Valkey at <target-node-ip>:<target-node-port>: Connection refused
>>> Removing node <id> from cluster <entry-node-ip>:<entry-node-port>
>>> Sending CLUSTER FORGET messages to the cluster...
>>> WARNING: Could not connect to node <target-node-ip>:<target-node-port>, unable to send CLUSTER RESET.
[OK] Node <id> removed from the cluster.
```

Fixes #3208

---------

Signed-off-by: Yang Zhao <zymy701@gmail.com>
In valkey.conf, slot-migration-max-failover-repl-bytes allows setting
to -1 to disable the limit.
```
Setting this to -1 will disable this limit
```

But slot-migration-max-failover-repl-bytes is defined as MEMORY_CONFIG
and memtoull() rejects negative inputs, making it impossible to set the
value to -1 via config file or CONFIG SET.
```
>>> 'slot-migration-max-failover-repl-bytes "-1"'
argument must be a memory value
```

Introduce SIGNED_MEMORY_CONFIG flag for memory configs that also accept
plain negative number. When memtoull() fails and this flag is set, fall back to string2ll()
for parsing. Use ll2string() for CONFIG GET and rewriteConfigNumericalOption() for
CONFIG REWRITE when the value is negative.

Add a serverAssert in initConfigValues() to enforce that PERCENT_CONFIG
and SIGNED_MEMORY_CONFIG are never combined on the same config, since
both use negative values with different semantics.

This means we have had this issue since it was introduced in #1949.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@github-actions
Copy link
Copy Markdown

❌ Provenance Check Alert

Potential code similarities detected with upstream repository.

  • 2026-05-18 11:17:01 [INFO] - matches redis/redis PR #9815 (similarity: 0.913, method: file_simhash+deep); file pairs: tests/unit/type/zset.tcl <- tests/unit/type/zset.tcl

This check was performed automatically by the Provenance Guard Action.

zuiderkwast and others added 5 commits May 18, 2026 23:14
Some test cases write thoughsands of commands in a pipeline and
afterwards read the replies. This can lead to TCP ACK being dropped and
the connection broken. CLIENT REPLY OFF prevents this.

"Main db not affected when fail to diskless load" in
cluster/diskless-load-swapdb has been observed to be flaky.

The others are just defensive but they follow the same pattern.

Similar fixes in the past: #3430, #2483.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Increase time to wait for bgsave to start. This wait has been seen
failing in these test cases.

The test case loops with 'no', 'slow', 'fast', 'all' and 'timeout'
replicas, in tests/integration/replication.tcl

Example:

*** [err]: diskless fast replicas drop during rdb pipe in
tests/integration/replication.tcl
    rdb child didn't terminate

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…igs (#3440)

rewriteConfigFormatMemory() and rewriteConfigBytesOption() used long long
parameters to represent memory bytes, but configs like maxmemory are stored
as unsigned long long and allow values up to ULLONG_MAX.

When the value exceeds LLONG_MAX (e.g. 9223372036854775808), it overflows to
negative when passed as long long, causing config rewrite to produce incorrect
output like "maxmemory -8589934592gb".

Change both functions to use unsigned long long, matching the actual semantics
of memory byte values. This is consistent with how numericConfigGet() already
handles MEMORY_CONFIG using ull2string().

There are some MEMORY_CONFIG are signed and can be negative:
1. maxmemory-clients is a PERCENT_CONFIG
2. slot-migration-max-failover-repl-bytes is a SIGNED_MEMORY_CONFIG (after #3443) 

None of them were affected:
```
static void numericConfigRewrite(standardConfig *config, const char *name, struct rewriteConfigState *state) {
    ...
    if (config->data.numeric.flags & PERCENT_CONFIG && value < 0) {
        rewriteConfigPercentOption(state, name, -value, config->data.numeric.default_value);
    } else if (config->data.numeric.flags & SIGNED_MEMORY_CONFIG && value < 0) {
        rewriteConfigNumericalOption(state, name, value, config->data.numeric.default_value);
    } else if (config->data.numeric.flags & MEMORY_CONFIG) {
        rewriteConfigBytesOption(state, name, value, config->data.numeric.default_value);
    ...
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
…led (#3458)

When close_asap flag is set, set bytes read to 0
    
In the readToQueryBuf, the c->nread represents the number of bytes read.
When close_asap flag is set, there is a bug where the c->nread isn't
reset to 0 and this breaks the invariant. IOThreads then incorrectly
think there is data to read and results in a crash. This change fixes
this bug.

To elaborate on the race possible:

1. Let's say that a IO thread job for reading query from a client got
enqueued as part of a epoll -
https://github.com/valkey-io/valkey/blob/unstable/src/io_threads.c#L417.
2. Later the client gets freed async and is marked as close_asap -
https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L2175
3. While processing the io_thread job for the client, it invokes
iothreadReadQueryFromClient. Here,
[`readToQueryBuf`](https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L6497)
returns as a no-op since the client is marked close-asap. Also, the
c->nread is not reset to 0 and count contain the value from a previous
read.
4. Later parseInputBuffer [gets
invoked](https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L6514).
5. The parseInputBuffer then [accesses the
query_buf](https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L3864).
The query_buf here would be null in resetSharedQueryBuf as part of
beforeNextClient.

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>
…3498)

There is a double free issue in the code. The error handling path called
both decrRefCount(o) and streamFreeNACK(nack), but the nack was obtained
from cgroup->pel via raxFind and is still referenced there. decrRefCount(o)
frees it through freeStream -> streamFreeCG -> raxFreeWithCallback(cg->pel, zfree),
so the explicit streamFreeNACK(nack) causes a double free.

Remove the redundant streamFreeNACK(nack) call and add a regression
test with a crafted corrupt payload that triggers the duplicate consumer
PEL entry path.

This was introduced in 492d8d0.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@valkeyrie-ops valkeyrie-ops Bot marked this pull request as ready for review May 18, 2026 23:15
@github-actions
Copy link
Copy Markdown

❌ Provenance Check Alert

Potential code similarities detected with upstream repository.

  • 2026-05-18 23:20:15 [INFO] - matches redis/redis PR #9815 (similarity: 0.913, method: file_simhash+deep); file pairs: tests/unit/type/zset.tcl <- tests/unit/type/zset.tcl

This check was performed automatically by the Provenance Guard Action.

@github-actions
Copy link
Copy Markdown

❌ Provenance Check Alert

Potential code similarities detected with upstream repository.

  • 2026-05-18 23:24:38 [INFO] - matches redis/redis PR #9815 (similarity: 0.913, method: file_simhash+deep); file pairs: tests/unit/type/zset.tcl <- tests/unit/type/zset.tcl

This check was performed automatically by the Provenance Guard Action.

quanyeyang and others added 9 commits May 19, 2026 10:20
Fixes server crash when RDMA benchmark clients disconnect (part of
#3345).

This PR focuses on the server-side crash fix. The benchmark
client-side fix will be submitted in a separate PR.

## Server Crash on Client Disconnect 

​ When interrupting valkey-benchmark with Ctrl+C, the server would crash
with a segmentation fault in handleReadResult, accessing address 0x8
(NULL pointer dereference).

### Root Cause

The same RDMA connection could be added to pending_list multiple times,
leading to use-after-free:

1. When a client disconnects, rdmaHandleDisconnect() adds the connection
to pending_list and sets conn->state = CONN_STATE_CLOSED
2. Before rdmaProcessPendingData() processes it, event-driven callbacks
(e.g., connRdmaSetWriteHandler) could add the same connection again to
pending_list
3. In rdmaProcessPendingData(), the first iteration processes and may
free the connection, but the second iteration accesses the already freed
connection → SIGSEGV at address 0x8 (NULL + offset)

The problem was exacerbated because:

  - valkey-benchmark creates multiple concurrent connections
  - On Ctrl+C, these connections disconnect nearly simultaneously
  - Event callbacks can re-add connections before processing completes

 **Stack trace from crash:**

  	handleReadResult+0x100
  	readQueryFromClient+0x63
  	rdmaProcessPendingData (0x1b1eb9)
  	beforeSleep+0x82

### Solution

 Four targeted fixes in `src/rdma.c`:

1. `rdmaHandleDisconnect (line 535)`: Check if pending_list_node is NULL
before adding to pending_list to prevent duplicates

   ```C
/* we can't close connection now, let's mark this connection as closed
state */
   if (rdma_conn->pending_list_node == NULL) {
       listAddNodeTail(pending_list, conn);
       rdma_conn->pending_list_node = listLast(pending_list);
   }
   ```

2. `connRdmaSetWriteHandler (line 956)`: Add same check before adding to
pending_list to prevent duplicates during write handler updates

   ```C
   /* does this connection has pending write data? */
   if (func) {
       if (rdma_conn->pending_list_node == NULL) {
           listAddNodeTail(pending_list, conn);
           rdma_conn->pending_list_node = listLast(pending_list);
       }
   } else if (rdma_conn->pending_list_node) {
       listDelNode(pending_list, rdma_conn->pending_list_node);
       rdma_conn->pending_list_node = NULL;
   }
   ```

3. `rdmaProcessPendingData (line 1786)`: Use iterator node 'ln' instead
of rdma_conn->pending_list_node when deleting, as the latter may have
been overwritten if the connection was re-added

   ```C
listDelNode(pending_list, ln); // Use ln, not
rdma_conn->pending_list_node
   rdma_conn->pending_list_node = NULL;
   ```

4. `rdmaProcessPendingData (line 1782-1787)`: Reorder operations to call
handlers before deleting from list, ensuring connection structure
remains valid during handler execution

   ```C
if (conn->state == CONN_STATE_ERROR || conn->state == CONN_STATE_CLOSED)
{
       /* Invoke handlers first, then remove from list */
       if (callHandler(conn, conn->read_handler)) {
           callHandler(conn, conn->write_handler);
       }

       listDelNode(pending_list, ln);
       rdma_conn->pending_list_node = NULL;

       ++processed;
       continue;
   }
   ```

The fix uses **pending_list_node** as both a list pointer and a
"**already in list**" flag:

  - NULL = not in list, safe to add
  - non-NULL = already in list, skip adding

This ensures each connection appears in pending_list at most once,
preventing use-after-free crashes.

Signed-off-by: Ada-Church-Closure <nunotabashinobu066@gmail.com>
`hpersistCommand` calls `addReplyArrayLen` before `lookupKeyWrite` +
`checkType`. When HPERSIST targets a non-hash key, the server writes a
RESP array header followed by a WRONGTYPE error — a malformed response
that permanently desynchronizes the client connection.

This moves `lookupKeyWrite` + `checkType` before `addReplyArrayLen`,
matching the pattern used by every other HFE command (e.g.
`hgetdelCommand`, `hexpireGenericCommand`).

Added a test for HPERSIST on a wrong-type key.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
## Summary
Fix a file descriptor leak in `connSocketBlockingConnect()` when
`aeWait()` times out.

## Bug
When `anetTcpNonBlockConnect()` succeeds but `aeWait()` times out (e.g.,
MIGRATE to an unreachable host), the fd is leaked because it was never
assigned to `conn->fd`. The caller's `connClose()` checks `conn->fd !=
-1` and skips cleanup.

## Fix
Assign `conn->fd = fd` immediately after `anetTcpNonBlockConnect()`
succeeds, before `aeWait()`. This way the caller's normal `connClose()`
cleanup path handles the fd on any error, which is consistent with how
the rest of the connection lifecycle works.

TLS connections also benefit since `connTLSBlockingConnect` delegates to
this function for the TCP layer.

## Reproducer
```
valkey-cli SET key hello
# Repeat against unreachable host:
for i in $(seq 1 30); do valkey-cli MIGRATE 192.0.2.1 6379 key 0 500; done
# Check: /proc/<pid>/fd shows 30 leaked socket fds
```

*This issue was generated by AI but verified, with love, by a human.*

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
If not cleared, the job may no longer be valid by the time the client
goes to cleanup. This dangling reference could cause a crash if you set
slot-migration-log-max-len to 0 and are very unlucky.

Signed-off-by: Jacob Murphy <jkmurphy@google.com>
Remove eval script cache entries that belong to a scripting engine when
that engine is unregistered. This prevents the eval cache from retaining
dangling engine pointers and keeps the tracked script memory in sync
after engine shutdown.

The scripting engine unregister path now invokes a new eval cleanup
helper, which scans the cached scripts, drops matching entries from the
LRU list and dictionary, and adjusts cache memory accounting accordingly.

* scripting engine
* eval cache

Signed-off-by: Eran Ifrah <eifrah@amazon.com>
Free BYPOLYGON points before returning from invalid COUNT parsing paths
in GEOSEARCH/GEOSEARCHSTORE.

Closes #3567

---------

Signed-off-by: Su Ko <rhtn1128@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
When read() returns 0 (EOF/connection closed) in syncRead(), errno is
not set by POSIX, so it retains a stale value (typically 0). This causes
callers using connGetLastError() to log strerror(0) which is the
misleading string "Success".

Set errno = ECONNRESET on EOF in syncRead(), matching the existing
pattern used for the timeout case (errno = ETIMEDOUT).

Also set conn->last_errno = errno in connSocketSyncWrite,
connSocketSyncRead, and connSocketSyncReadLine wrappers, matching the
pattern used by their async counterparts connSocketWrite and
connSocketRead.

After this fix, replica logs will show:
  "I/O error reading bulk count from PRIMARY: Connection reset by peer"
instead of the misleading:
  "I/O error reading bulk count from PRIMARY: Success"

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Signed-off-by: djk1027 <djk9510271@gmail.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Daejun Kim <djk9510271@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
When XTRIM marks the last entry in a listpack node as deleted, lpNext()
returns NULL after the lp-count field (EOF). The delta calculation (p -
lp) on a NULL pointer is undefined behavior and produces a garbage
pointer, corrupting the listpack. A subsequent XREAD hitting the
corrupted node triggers the lpValidateNext assertion failure and crashes
the server.

Guard the delta calculation with a NULL check so the while(p) loop
terminates naturally when the last entry is reached.

Fixes #3569

Signed-off-by: Saurabh Kher <saurabh@amazon.com>
Co-authored-by: Saurabh Kher <saurabh@amazon.com>
chenshi5012 and others added 4 commits May 20, 2026 10:12
…et node disappears (#3596)

### Summary

This PR fixes a NULL pointer dereference (SIGSEGV) in
`connectSlotExportJob()`
(`src/cluster_migrateslots.c`) that can crash a Valkey cluster node,
causing a
denial-of-service condition.

### Root Cause

When `CLUSTER MIGRATESLOTS` is issued, a migration job is created with
state
`SLOT_EXPORT_CONNECTING`. On the next `clusterCron()` tick,
`proceedWithSlotMigration()` calls `connectSlotExportJob()`, which looks
up the
target node via `clusterLookupNode()`.

`clusterLookupNode()` can legitimately return `NULL` — for example, if
the
target node is removed from the cluster (e.g. via `CLUSTER FORGET`)
between the
time the migration job is created and the time the cron fires. This is a
realistic race condition in any cluster topology change scenario.

The return value was **never checked**, so the subsequent call to
`getNodeDefaultReplicationPort(n)` immediately dereferences the NULL
pointer,
crashing the process:

```c
// Before fix — vulnerable
clusterNode *n = clusterLookupNode(job->target_node_name, CLUSTER_NAMELEN);
int port = getNodeDefaultReplicationPort(n);  // SIGSEGV if n == NULL
serverLog(..., n->ip, port);                  // second dereference

Signed-off-by: chenshi5012 <chenshi5012@163.com>
Ensure deferred-length reply placeholders are created and resolved in
the same reply list that subsequent nested replies use when the deferred
reply buffer is active. This prevents malformed responses when module
callbacks build postponed-length arrays while a client is already
deferring output.

Add a regression test module and unit test that reproduce the issue
through keyspace notifications and verify the nested array reply is
serialized correctly. Register the new module with the module test
build.

Here is a small illustration of what happens with and without the fix:

## With the fix — placeholder goes to `c->deferred_reply`:

```
   c->buf:            (empty)
   c->reply:          (empty)
   c->deferred_reply: [+OK\r\n] [*2\r\n] [+first\r\n] [*2\r\n] [+a\r\n] [+b\r\n]
                       ^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                       SET reply  reply callback (outer array, inner array via POSTPONED_LEN)
   → commitDeferredReplyBuffer joins into c->reply
   → Wire: +OK\r\n *2\r\n +first\r\n *2\r\n +a\r\n +b\r\n  ✓
            OK      [ "first",        ["a",   "b"] ]
```

##    Without the fix — placeholder goes to `c->reply`:
```
   c->buf:            (empty)
   c->reply:          [*2\r\n]          ← placeholder filled by setDeferredArrayLen (WRONG LIST!)
   c->deferred_reply: [+OK\r\n] [*2\r\n] [+first\r\n] [+a\r\n] [+b\r\n]
                       ^^^^^^^^  outer array  "first"     "a"      "b"
   → commitDeferredReplyBuffer appends deferred_reply AFTER reply
   → Wire: *2\r\n +OK\r\n *2\r\n +first\r\n +a\r\n +b\r\n  ✗
           [ OK,          ["first",   "a"] ]  "b"  ← orphaned   
```

* networking
* tests/modules
* tests/unit/moduleapi

**Generated by CodeLite**

---------

Signed-off-by: Eran Ifrah <eifrah@amazon.com>
`VM_GetLRU`, `VM_SetLRU`, `VM_GetLFU`, and `VM_SetLFU` crash with
SIGSEGV when passed a NULL `ValkeyModuleKey` pointer. This happens
because all four functions dereference `key->value` without first
checking if `key` itself is NULL.

When a module opens a nonexistent key in `VALKEYMODULE_READ` mode,
`VM_OpenKey` returns NULL. If a module passes that NULL pointer into any
of these functions, the server crashes.

```
valkey-server --loadmodule tests/modules/misc.so
valkey-cli test.getlru nonexistent_key
```

**`src/module.c`** — Add a `!key` guard before dereferencing
`key->value` in all four functions:

```c
// Before:
if (!key->value) return VALKEYMODULE_ERR;

// After:
if (!key || !key->value) return VALKEYMODULE_ERR;
```

**`tests/modules/misc.c`** — Add early return after
`open_key_or_reply()` in `test_getlru`, `test_setlru`, `test_getlfu`,
and `test_setlfu`. The helper already sends the error reply to the
client when the key is not found, so the command handler just needs to
stop processing:

```c
ValkeyModuleKey *key = open_key_or_reply(ctx, argv[1], VALKEYMODULE_READ|VALKEYMODULE_OPEN_KEY_NOTOUCH);
if (!key) return VALKEYMODULE_OK;
```

```
valkey-cli test.getlru nonexistent_key
(error) key not found
```

Signed-off-by: Yaron Sananes <yaron.sananes@gmail.com>
Fixes #3607

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.