[backport] Backport sweep for 9.0#3735
Conversation
…onfigured (#2846) When dual-channel-replication is enabled, and replica-announce-ip is set, the RDB/AOF channel does not announce itself at this endpoint. This defaults to the IP address behind the NAT, or the Kubernetes Pod IP in our case. This means that if Sentinel is polling the primary for connected replicas, it will first see the ephemeral pod IP, then revert to the announce-ip - leaving behind the pod IP as a down replica. This PR configures the RDB/AOF channel to also announce itself at the announce-ip to prevent the stale replica. ## Testing I evaluated writing unit tests for this, but I am not sure of a way we can test an IP address different to localhost (127.0.0.1) that would fail without the fix. I did test on Kubernetes against 9.0 tag and verified the fix there too. ### Status quo On 9.0 image tag: ``` $ kubectl get pods -n valkey-baseline -o custom-columns=NAME:.metadata.name,POD-IP:.status.podIP NAME POD-IP valkey-primary-5bd78c8566-llb6k 10.244.0.25 valkey-replica-0 10.244.0.17 valkey-replica-1 10.244.0.13 $ kubectl get services -n valkey-baseline -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.spec.clusterIP NAME CLUSTER-IP valkey-primary 10.96.147.28 valkey-replica-0 10.96.66.233 valkey-replica-1 10.96.57.230 ``` Logs below show that pod IP for valkey-primary-5bd78c8566-llb6k `10.244.0.25:6379` is being used for dual-channel replication. This should be its cluster IP `10.96.147.28` as this is what is set in replica-announce-ip. ``` 1:M 14 Nov 2025 17:57:51.750 * Replica 10.96.147.28:6379 asks for synchronization 1:M 14 Nov 2025 17:57:51.751 * Replica 10.244.0.25:6379 asks for synchronization 1:M 14 Nov 2025 17:57:56.135 * Dual channel replication: Sending to replica 10.244.0.25:6379 RDB end offset 1763269 and client-id 35 1:M 14 Nov 2025 17:57:56.140 * Replica 10.96.147.28:6379 asks for synchronization ``` ### This fix ``` $ kubectl get pods -n valkey-test -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.status.podIP NAME POD-IP valkey-primary-594c9597b5-qqvdk 10.244.0.26 valkey-replica-0 10.244.0.10 valkey-replica-1 10.244.0.18 $ kubectl get services -n valkey-test -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.spec.clusterIP NAME CLUSTER-IP valkey-primary 10.96.125.142 valkey-replica None valkey-replica-0 10.96.155.74 valkey-replica-1 10.96.64.111 valkey-sentinel None ``` Logs show that the Cluster IP is now being used for dual-channel replication. ``` 1:M 14 Nov 2025 17:57:49.923 * Replica 10.96.125.142:6379 asks for synchronization 1:M 14 Nov 2025 17:57:49.924 * Replica 10.96.125.142:6379 asks for synchronization 1:M 14 Nov 2025 17:57:54.913 * Dual channel replication: Sending to replica 10.96.125.142:6379 RDB end offset 1771247 and client-id 36 1:M 14 Nov 2025 17:57:54.916 * Replica 10.96.125.142:6379 asks for synchronization ``` Fixes #2338 Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Noticed a use-after-free/double-free bug while load-testing manual slot migrations. Was able to reliably trigger a process crash. Original trace: ``` Backtrace: 0 libsystem_platform.dylib 0x00000001917496a4 _sigtramp + 56 1 libsystem_pthread.dylib 0x000000019170f848 pthread_kill + 296 2 libsystem_c.dylib 0x00000001916189e4 abort + 124 3 libsystem_malloc.dylib 0x000000019151c174 malloc_vreport + 892 4 libsystem_malloc.dylib 0x000000019151fc90 malloc_report + 64 5 libsystem_malloc.dylib 0x000000019152421c ___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32 6 valkey-server 0x0000000104fc2174 _sdsMakeRoomFor + 780 7 valkey-server 0x0000000104fc30a0 sdscatfmt + 168 8 valkey-server 0x00000001050318b8 generateSyncSlotsEstablishCommand + 176 9 valkey-server 0x000000010503156c createSlotImportJob + 324 10 valkey-server 0x0000000105034fd4 clusterCommandSyncSlots + 1572 11 valkey-server 0x000000010502f370 clusterCommandSpecial + 2432 12 valkey-server 0x0000000105021e80 clusterCommand + 2992 13 valkey-server 0x0000000104fb59a8 call + 320 14 valkey-server 0x0000000104fcc708 processCommandAndResetClient + 3088 15 valkey-server 0x0000000104fc9c3c processInputBuffer + 440 16 valkey-server 0x0000000104fc9198 readQueryFromClient + 108 17 valkey-server 0x000000010508073c connSocketEventHandler + 136 18 valkey-server 0x0000000104fa49b4 aeProcessEvents + 316 19 valkey-server 0x0000000104fc081c main + 18408 20 dyld 0x000000019136eb98 start + 6076 ``` The bug seems to be that the result of `sdscatfmt` is being ignored. Since `sdscatfmt` potentially reallocs a new buffer when it needs to, and deallocs the old one, this can cause a use-after-free/double-free if the loop triggers a buffer-size increase. This PR fixes the bug. Signed-off-by: Tony Wooster <twooster@gmail.coM>
Made few changes to support TSAN build in valkey-search repository. Related: valkey-io/valkey-search#703 Signed-off-by: Baswanth Vegunta <baswanth@amazon.com> Co-authored-by: Baswanth Vegunta <baswanth@amazon.com>
…ty primary (#2811) ## Summary This PR handles a network race condition that would cause a replica to read stale cluster packet and then incorrectly promote itself to an empty primary within an existing shard. ## Issue "Migrated replica reports zero repl offset and rank, and fails to win election - sigstop" the test case in `replica-migration.tcl` is a known example where the network race condition can occur. Here's the timeline: - T1: Slot migration — R7 and R3 both try to replicate from R0. Only R3 succeeds, triggering a BGSAVE on R0, while R7 blocks in `receiveSynchronousResponse()`. - T2: While R7 is blocked, R0 is SIGSTOP'd by the test. R4 wins the election and becomes the new primary. R4 sends PINGs to R7 - These PINGs land in R7's kernel TCP receive buffer but are not read by R7 yet. - T3 (5s after T1, due to receiveSynchronousResponse): R7 wakes up: - It reads from inbound links and finds out the remote nodes have already closed their end (they detected R7 as FAIL), getting "I/O error: connection closed" on each. R7 calls `accept()` for the new connections that were established during the block. These connections carry data that was sent seconds ago while R7 was dead. - R7's outbound links are still valid, so it sends PING to R4. - T4: R7 receives and processes fresh PONG packet from R4 on outbound link, reconfiguring itself to follow R4. - T5: R7 reads stale PING packet of R4 via inbound link. This packet was generated R4 was following R0, so R7 incorrectly believes R4 is still following R0 now. And Reconfiguring itself as a replica of R0 from R4. - T6: R7 finds R0 is FAIL, so it starts an election and wins, and becomes an empty-primary. ## Analyze So in T4, R4 is the new primary, and R7 is reconfiguring itself as a replica. So in R7's view, R4 is the primary, and myself (R7) is a replica. And in T5, there is a stale packet from sender (R4), the stale packet is saying: sender (R4) is a replica and R0 is the primary. We originally had a logic for stale packet, meaning we would try to ignore stale packet that would cause exceptions. So in T5: - sender_claims_to_be_primary is false since R4 is saying it is a replica. - sender_last_reported_as_primary is true since in R7's view, R4 is a primary. - sender_claimed_primary is R0, and sender (R4) and sender_claimed_primary (R0) is in the same shard. - nodeEpoch(sender_claimed_primary) is R0's epoch. R0 is an old and dead (not yet) primary. - sender_claimed_config_epoch is R0's epoch since R4 is a replica, and R0 is R4's primary. - nodeEpoch(sender_claimed_primary) == sender_claimed_config_epoch, so the logic fail and we process a stale packet. So in this point, the packet is not a stale packet in R4 and R0's view. - But it is a stale packet in myself (R7) view. In R7's local view, R4 is the new primary and it should have a bigger epoch, that is nodeEpoch(sender) should > sender_claimed_config_epoch. ## Fix The PR fixes the issue by enhancing the existing guardrail logic against stale packet. Previously that logic only detects `nodeEpoch(sender_claimed_primary) > sender_claimed_config_epoch` as stale packet, now it also checks `nodeEpoch(sender) > sender_claimed_config_epoch` to make sure we have up-to-date primary-replica chain. Signed-off-by: Zhijun <dszhijun@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com>
`zdiffAlgorithm2()` can break out early once the destination cardinality reaches zero. In that path, a temporary SDS created by `zuiSdsFromValue()` was left dirty and never released, because that cleanup normally happens on the next iterator step. This patch explicitly discards the dirty value before the early `break`. ``` ==11724== 11 bytes in 1 blocks are definitely lost in loss record 36 of 1,442 ==11724== at 0x4846828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==11724== by 0x2FA974: ztrymalloc_usable_internal (zmalloc.c:156) ==11724== by 0x2FAAEA: zmalloc_usable (zmalloc.c:200) ==11724== by 0x282D25: _sdsnewlen (sds.c:102) ==11724== by 0x2830B2: sdsnewlen (sds.c:169) ==11724== by 0x2DB537: zuiSdsFromValue (t_zset.c:2290) ==11724== by 0x2DBE00: zdiffAlgorithm2 (t_zset.c:2502) ==11724== by 0x2DC085: zdiff (t_zset.c:2568) ==11724== by 0x2DD01F: zunionInterDiffGenericCommand (t_zset.c:2817) ==11724== by 0x2DD574: zdiffCommand (t_zset.c:2898) ==11724== by 0x29F5AE: call (server.c:3883) ==11724== by 0x2A1598: processCommand (server.c:4569) ``` Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Sarthak Aggarwal <sarthakaggarwal97@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Comment |
❌ Provenance Check AlertPotential code similarities detected with upstream repository.
This check was performed automatically by the Provenance Guard Action. |
Currently, when parsing querybuf, we are not checking for CRLF, instead we assume the last two characters are CRLF by default, as shown in the following example: ``` telnet 127.0.0.1 6379 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. *3 $3 set $3 key $5 value12 +OK get key $5 value *3 $3 set $3 key $5 value12345 +OK -ERR unknown command '345', with args beginning with: ``` This should actually be considered a protocol error. When a bug occurs in the client-side implementation, we may execute incorrect requests (writing incorrect data is the most serious of these). --------- Signed-off-by: Binbin <binloveplay1314@qq.com>
The multiStateMemOverhead() function was incorrectly calculating the memory overhead for watched keys. It used sizeof(c->mstate->watched_keys) which is the size of the list structure itself, instead of sizeof(watchedKey) which is the actual per-key overhead. This was introduced in #1405. Signed-off-by: Binbin <binloveplay1314@qq.com>
In `evalMode()`, `argv2` is allocated with `zmalloc()` but freed with `free()`. On builds using jemalloc/tcmalloc, this can cause `free(): invalid pointer` Closes #3273 Signed-off-by: Su Ko <rhtn1128@gmail.com>
The `valkey-cli --cluster del-node` command fails when attempting to delete unreachable or failed nodes, reporting `No such node ID` even though the node exists in the cluster topology. The root cause is the command only loads information about reachable nodes, causing the lookup to fail. This PR added a new function for loading all nodes information to solve this. 1. Loading all nodes from gossip: - Added `clusterManagerLoadAllInfoFromNode()` that loads both reachable and unreachable nodes from cluster gossip - Extracts common logic into `clusterManagerLoadInfoCommon()` with an `include_unreachable` flag - Keeps the original `clusterManagerLoadInfoFromNode()` unchanged to avoid affecting existing callers 2. Added success message to be consistent with other cluster commands: `[OK] Node <id> removed from the cluster.` 3. Added test coverage for `del-node` which previously had none. 4. Load slot information from gossip for unreachable nodes in `clusterManagerNodeLoadInfo()` 5. Skip unreachable primaries in `clusterManagerNodeWithLeastReplicas()` ``` ./runtest --single unit/cluster/cli [ok]: del-node: Cannot delete node with slots (9 ms) [ok]: del-node: Delete reachable node without slots (23 ms) [ok]: del-node: Delete unreachable node without slots (1333 ms) [ok]: del-node: Cannot delete unreachable primary with slots (3368 ms) ``` ``` valkey-cli --cluster del-node 127.0.0.1:7000 eb837ea7c48908e5304eafd8b1b3ced57147c448 Could not connect to Valkey at 127.0.0.1:7002: Connection refused >>> Removing node eb837ea7c48908e5304eafd8b1b3ced57147c448 from cluster 127.0.0.1:7000 >>> Sending CLUSTER FORGET messages to the cluster... >>> WARNING: Could not connect to node 127.0.0.1:7002, unable to send CLUSTER RESET. [OK] Node eb837ea7c48908e5304eafd8b1b3ced57147c448 removed from the cluster. ``` Before ``` $ valkey-cli --cluster del-node <entry-node-ip>:<entry-node-port> <failed-node-id> Could not connect to Valkey at <target-node-ip>:<target-node-port>: Connection refused >>> Removing node <id> from cluster <entry-node-ip>:<entry-node-port> [ERR] No such node ID <id> ``` After ``` $ valkey-cli --cluster del-node <entry-node-ip>:<entry-node-port> <failed-node-id> Could not connect to Valkey at <target-node-ip>:<target-node-port>: Connection refused >>> Removing node <id> from cluster <entry-node-ip>:<entry-node-port> >>> Sending CLUSTER FORGET messages to the cluster... >>> WARNING: Could not connect to node <target-node-ip>:<target-node-port>, unable to send CLUSTER RESET. [OK] Node <id> removed from the cluster. ``` Fixes #3208 --------- Signed-off-by: Yang Zhao <zymy701@gmail.com>
In valkey.conf, slot-migration-max-failover-repl-bytes allows setting to -1 to disable the limit. ``` Setting this to -1 will disable this limit ``` But slot-migration-max-failover-repl-bytes is defined as MEMORY_CONFIG and memtoull() rejects negative inputs, making it impossible to set the value to -1 via config file or CONFIG SET. ``` >>> 'slot-migration-max-failover-repl-bytes "-1"' argument must be a memory value ``` Introduce SIGNED_MEMORY_CONFIG flag for memory configs that also accept plain negative number. When memtoull() fails and this flag is set, fall back to string2ll() for parsing. Use ll2string() for CONFIG GET and rewriteConfigNumericalOption() for CONFIG REWRITE when the value is negative. Add a serverAssert in initConfigValues() to enforce that PERCENT_CONFIG and SIGNED_MEMORY_CONFIG are never combined on the same config, since both use negative values with different semantics. This means we have had this issue since it was introduced in #1949. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
❌ Provenance Check AlertPotential code similarities detected with upstream repository.
This check was performed automatically by the Provenance Guard Action. |
Some test cases write thoughsands of commands in a pipeline and afterwards read the replies. This can lead to TCP ACK being dropped and the connection broken. CLIENT REPLY OFF prevents this. "Main db not affected when fail to diskless load" in cluster/diskless-load-swapdb has been observed to be flaky. The others are just defensive but they follow the same pattern. Similar fixes in the past: #3430, #2483. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Increase time to wait for bgsave to start. This wait has been seen
failing in these test cases.
The test case loops with 'no', 'slow', 'fast', 'all' and 'timeout'
replicas, in tests/integration/replication.tcl
Example:
*** [err]: diskless fast replicas drop during rdb pipe in
tests/integration/replication.tcl
rdb child didn't terminate
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…igs (#3440) rewriteConfigFormatMemory() and rewriteConfigBytesOption() used long long parameters to represent memory bytes, but configs like maxmemory are stored as unsigned long long and allow values up to ULLONG_MAX. When the value exceeds LLONG_MAX (e.g. 9223372036854775808), it overflows to negative when passed as long long, causing config rewrite to produce incorrect output like "maxmemory -8589934592gb". Change both functions to use unsigned long long, matching the actual semantics of memory byte values. This is consistent with how numericConfigGet() already handles MEMORY_CONFIG using ull2string(). There are some MEMORY_CONFIG are signed and can be negative: 1. maxmemory-clients is a PERCENT_CONFIG 2. slot-migration-max-failover-repl-bytes is a SIGNED_MEMORY_CONFIG (after #3443) None of them were affected: ``` static void numericConfigRewrite(standardConfig *config, const char *name, struct rewriteConfigState *state) { ... if (config->data.numeric.flags & PERCENT_CONFIG && value < 0) { rewriteConfigPercentOption(state, name, -value, config->data.numeric.default_value); } else if (config->data.numeric.flags & SIGNED_MEMORY_CONFIG && value < 0) { rewriteConfigNumericalOption(state, name, value, config->data.numeric.default_value); } else if (config->data.numeric.flags & MEMORY_CONFIG) { rewriteConfigBytesOption(state, name, value, config->data.numeric.default_value); ... ``` Signed-off-by: Binbin <binloveplay1314@qq.com>
…led (#3458) When close_asap flag is set, set bytes read to 0 In the readToQueryBuf, the c->nread represents the number of bytes read. When close_asap flag is set, there is a bug where the c->nread isn't reset to 0 and this breaks the invariant. IOThreads then incorrectly think there is data to read and results in a crash. This change fixes this bug. To elaborate on the race possible: 1. Let's say that a IO thread job for reading query from a client got enqueued as part of a epoll - https://github.com/valkey-io/valkey/blob/unstable/src/io_threads.c#L417. 2. Later the client gets freed async and is marked as close_asap - https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L2175 3. While processing the io_thread job for the client, it invokes iothreadReadQueryFromClient. Here, [`readToQueryBuf`](https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L6497) returns as a no-op since the client is marked close-asap. Also, the c->nread is not reset to 0 and count contain the value from a previous read. 4. Later parseInputBuffer [gets invoked](https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L6514). 5. The parseInputBuffer then [accesses the query_buf](https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L3864). The query_buf here would be null in resetSharedQueryBuf as part of beforeNextClient. Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>
…3498) There is a double free issue in the code. The error handling path called both decrRefCount(o) and streamFreeNACK(nack), but the nack was obtained from cgroup->pel via raxFind and is still referenced there. decrRefCount(o) frees it through freeStream -> streamFreeCG -> raxFreeWithCallback(cg->pel, zfree), so the explicit streamFreeNACK(nack) causes a double free. Remove the redundant streamFreeNACK(nack) call and add a regression test with a crafted corrupt payload that triggers the duplicate consumer PEL entry path. This was introduced in 492d8d0. Signed-off-by: Binbin <binloveplay1314@qq.com>
❌ Provenance Check AlertPotential code similarities detected with upstream repository.
This check was performed automatically by the Provenance Guard Action. |
❌ Provenance Check AlertPotential code similarities detected with upstream repository.
This check was performed automatically by the Provenance Guard Action. |
Fixes server crash when RDMA benchmark clients disconnect (part of #3345). This PR focuses on the server-side crash fix. The benchmark client-side fix will be submitted in a separate PR. ## Server Crash on Client Disconnect When interrupting valkey-benchmark with Ctrl+C, the server would crash with a segmentation fault in handleReadResult, accessing address 0x8 (NULL pointer dereference). ### Root Cause The same RDMA connection could be added to pending_list multiple times, leading to use-after-free: 1. When a client disconnects, rdmaHandleDisconnect() adds the connection to pending_list and sets conn->state = CONN_STATE_CLOSED 2. Before rdmaProcessPendingData() processes it, event-driven callbacks (e.g., connRdmaSetWriteHandler) could add the same connection again to pending_list 3. In rdmaProcessPendingData(), the first iteration processes and may free the connection, but the second iteration accesses the already freed connection → SIGSEGV at address 0x8 (NULL + offset) The problem was exacerbated because: - valkey-benchmark creates multiple concurrent connections - On Ctrl+C, these connections disconnect nearly simultaneously - Event callbacks can re-add connections before processing completes **Stack trace from crash:** handleReadResult+0x100 readQueryFromClient+0x63 rdmaProcessPendingData (0x1b1eb9) beforeSleep+0x82 ### Solution Four targeted fixes in `src/rdma.c`: 1. `rdmaHandleDisconnect (line 535)`: Check if pending_list_node is NULL before adding to pending_list to prevent duplicates ```C /* we can't close connection now, let's mark this connection as closed state */ if (rdma_conn->pending_list_node == NULL) { listAddNodeTail(pending_list, conn); rdma_conn->pending_list_node = listLast(pending_list); } ``` 2. `connRdmaSetWriteHandler (line 956)`: Add same check before adding to pending_list to prevent duplicates during write handler updates ```C /* does this connection has pending write data? */ if (func) { if (rdma_conn->pending_list_node == NULL) { listAddNodeTail(pending_list, conn); rdma_conn->pending_list_node = listLast(pending_list); } } else if (rdma_conn->pending_list_node) { listDelNode(pending_list, rdma_conn->pending_list_node); rdma_conn->pending_list_node = NULL; } ``` 3. `rdmaProcessPendingData (line 1786)`: Use iterator node 'ln' instead of rdma_conn->pending_list_node when deleting, as the latter may have been overwritten if the connection was re-added ```C listDelNode(pending_list, ln); // Use ln, not rdma_conn->pending_list_node rdma_conn->pending_list_node = NULL; ``` 4. `rdmaProcessPendingData (line 1782-1787)`: Reorder operations to call handlers before deleting from list, ensuring connection structure remains valid during handler execution ```C if (conn->state == CONN_STATE_ERROR || conn->state == CONN_STATE_CLOSED) { /* Invoke handlers first, then remove from list */ if (callHandler(conn, conn->read_handler)) { callHandler(conn, conn->write_handler); } listDelNode(pending_list, ln); rdma_conn->pending_list_node = NULL; ++processed; continue; } ``` The fix uses **pending_list_node** as both a list pointer and a "**already in list**" flag: - NULL = not in list, safe to add - non-NULL = already in list, skip adding This ensures each connection appears in pending_list at most once, preventing use-after-free crashes. Signed-off-by: Ada-Church-Closure <nunotabashinobu066@gmail.com>
`hpersistCommand` calls `addReplyArrayLen` before `lookupKeyWrite` + `checkType`. When HPERSIST targets a non-hash key, the server writes a RESP array header followed by a WRONGTYPE error — a malformed response that permanently desynchronizes the client connection. This moves `lookupKeyWrite` + `checkType` before `addReplyArrayLen`, matching the pattern used by every other HFE command (e.g. `hgetdelCommand`, `hexpireGenericCommand`). Added a test for HPERSIST on a wrong-type key. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
## Summary Fix a file descriptor leak in `connSocketBlockingConnect()` when `aeWait()` times out. ## Bug When `anetTcpNonBlockConnect()` succeeds but `aeWait()` times out (e.g., MIGRATE to an unreachable host), the fd is leaked because it was never assigned to `conn->fd`. The caller's `connClose()` checks `conn->fd != -1` and skips cleanup. ## Fix Assign `conn->fd = fd` immediately after `anetTcpNonBlockConnect()` succeeds, before `aeWait()`. This way the caller's normal `connClose()` cleanup path handles the fd on any error, which is consistent with how the rest of the connection lifecycle works. TLS connections also benefit since `connTLSBlockingConnect` delegates to this function for the TCP layer. ## Reproducer ``` valkey-cli SET key hello # Repeat against unreachable host: for i in $(seq 1 30); do valkey-cli MIGRATE 192.0.2.1 6379 key 0 500; done # Check: /proc/<pid>/fd shows 30 leaked socket fds ``` *This issue was generated by AI but verified, with love, by a human.* Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
If not cleared, the job may no longer be valid by the time the client goes to cleanup. This dangling reference could cause a crash if you set slot-migration-log-max-len to 0 and are very unlucky. Signed-off-by: Jacob Murphy <jkmurphy@google.com>
Remove eval script cache entries that belong to a scripting engine when that engine is unregistered. This prevents the eval cache from retaining dangling engine pointers and keeps the tracked script memory in sync after engine shutdown. The scripting engine unregister path now invokes a new eval cleanup helper, which scans the cached scripts, drops matching entries from the LRU list and dictionary, and adjusts cache memory accounting accordingly. * scripting engine * eval cache Signed-off-by: Eran Ifrah <eifrah@amazon.com>
Free BYPOLYGON points before returning from invalid COUNT parsing paths in GEOSEARCH/GEOSEARCHSTORE. Closes #3567 --------- Signed-off-by: Su Ko <rhtn1128@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com>
When read() returns 0 (EOF/connection closed) in syncRead(), errno is not set by POSIX, so it retains a stale value (typically 0). This causes callers using connGetLastError() to log strerror(0) which is the misleading string "Success". Set errno = ECONNRESET on EOF in syncRead(), matching the existing pattern used for the timeout case (errno = ETIMEDOUT). Also set conn->last_errno = errno in connSocketSyncWrite, connSocketSyncRead, and connSocketSyncReadLine wrappers, matching the pattern used by their async counterparts connSocketWrite and connSocketRead. After this fix, replica logs will show: "I/O error reading bulk count from PRIMARY: Connection reset by peer" instead of the misleading: "I/O error reading bulk count from PRIMARY: Success" --------- Signed-off-by: Abhishek Mathur <matshek@amazon.com> Signed-off-by: djk1027 <djk9510271@gmail.com> Co-authored-by: Abhishek Mathur <matshek@amazon.com> Co-authored-by: Daejun Kim <djk9510271@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
#3586) clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every invocation, then immediately rand() % primary_count. When called in a tight loop for uncovered slots, all calls within the same wall-clock second produce the identical seed, causing every uncovered slot to be assigned to the same primary node. Remove the srand() call since the PRNG is already seeded at startup (srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to advance its state across calls, distributing uncovered slots randomly across available primaries. --------- Signed-off-by: Abhishek Mathur <matshek@amazon.com> Co-authored-by: Abhishek Mathur <matshek@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
When XTRIM marks the last entry in a listpack node as deleted, lpNext() returns NULL after the lp-count field (EOF). The delta calculation (p - lp) on a NULL pointer is undefined behavior and produces a garbage pointer, corrupting the listpack. A subsequent XREAD hitting the corrupted node triggers the lpValidateNext assertion failure and crashes the server. Guard the delta calculation with a NULL check so the while(p) loop terminates naturally when the last entry is reached. Fixes #3569 Signed-off-by: Saurabh Kher <saurabh@amazon.com> Co-authored-by: Saurabh Kher <saurabh@amazon.com>
…et node disappears (#3596) ### Summary This PR fixes a NULL pointer dereference (SIGSEGV) in `connectSlotExportJob()` (`src/cluster_migrateslots.c`) that can crash a Valkey cluster node, causing a denial-of-service condition. ### Root Cause When `CLUSTER MIGRATESLOTS` is issued, a migration job is created with state `SLOT_EXPORT_CONNECTING`. On the next `clusterCron()` tick, `proceedWithSlotMigration()` calls `connectSlotExportJob()`, which looks up the target node via `clusterLookupNode()`. `clusterLookupNode()` can legitimately return `NULL` — for example, if the target node is removed from the cluster (e.g. via `CLUSTER FORGET`) between the time the migration job is created and the time the cron fires. This is a realistic race condition in any cluster topology change scenario. The return value was **never checked**, so the subsequent call to `getNodeDefaultReplicationPort(n)` immediately dereferences the NULL pointer, crashing the process: ```c // Before fix — vulnerable clusterNode *n = clusterLookupNode(job->target_node_name, CLUSTER_NAMELEN); int port = getNodeDefaultReplicationPort(n); // SIGSEGV if n == NULL serverLog(..., n->ip, port); // second dereference Signed-off-by: chenshi5012 <chenshi5012@163.com>
Ensure deferred-length reply placeholders are created and resolved in
the same reply list that subsequent nested replies use when the deferred
reply buffer is active. This prevents malformed responses when module
callbacks build postponed-length arrays while a client is already
deferring output.
Add a regression test module and unit test that reproduce the issue
through keyspace notifications and verify the nested array reply is
serialized correctly. Register the new module with the module test
build.
Here is a small illustration of what happens with and without the fix:
## With the fix — placeholder goes to `c->deferred_reply`:
```
c->buf: (empty)
c->reply: (empty)
c->deferred_reply: [+OK\r\n] [*2\r\n] [+first\r\n] [*2\r\n] [+a\r\n] [+b\r\n]
^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SET reply reply callback (outer array, inner array via POSTPONED_LEN)
→ commitDeferredReplyBuffer joins into c->reply
→ Wire: +OK\r\n *2\r\n +first\r\n *2\r\n +a\r\n +b\r\n ✓
OK [ "first", ["a", "b"] ]
```
## Without the fix — placeholder goes to `c->reply`:
```
c->buf: (empty)
c->reply: [*2\r\n] ← placeholder filled by setDeferredArrayLen (WRONG LIST!)
c->deferred_reply: [+OK\r\n] [*2\r\n] [+first\r\n] [+a\r\n] [+b\r\n]
^^^^^^^^ outer array "first" "a" "b"
→ commitDeferredReplyBuffer appends deferred_reply AFTER reply
→ Wire: *2\r\n +OK\r\n *2\r\n +first\r\n +a\r\n +b\r\n ✗
[ OK, ["first", "a"] ] "b" ← orphaned
```
* networking
* tests/modules
* tests/unit/moduleapi
**Generated by CodeLite**
---------
Signed-off-by: Eran Ifrah <eifrah@amazon.com>
`VM_GetLRU`, `VM_SetLRU`, `VM_GetLFU`, and `VM_SetLFU` crash with SIGSEGV when passed a NULL `ValkeyModuleKey` pointer. This happens because all four functions dereference `key->value` without first checking if `key` itself is NULL. When a module opens a nonexistent key in `VALKEYMODULE_READ` mode, `VM_OpenKey` returns NULL. If a module passes that NULL pointer into any of these functions, the server crashes. ``` valkey-server --loadmodule tests/modules/misc.so valkey-cli test.getlru nonexistent_key ``` **`src/module.c`** — Add a `!key` guard before dereferencing `key->value` in all four functions: ```c // Before: if (!key->value) return VALKEYMODULE_ERR; // After: if (!key || !key->value) return VALKEYMODULE_ERR; ``` **`tests/modules/misc.c`** — Add early return after `open_key_or_reply()` in `test_getlru`, `test_setlru`, `test_getlfu`, and `test_setlfu`. The helper already sends the error reply to the client when the key is not found, so the command handler just needs to stop processing: ```c ValkeyModuleKey *key = open_key_or_reply(ctx, argv[1], VALKEYMODULE_READ|VALKEYMODULE_OPEN_KEY_NOTOUCH); if (!key) return VALKEYMODULE_OK; ``` ``` valkey-cli test.getlru nonexistent_key (error) key not found ``` Signed-off-by: Yaron Sananes <yaron.sananes@gmail.com>
Fixes #3607 Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Backport sweep for 9.0
Automated cherry-picks from PRs marked "To be backported".
Applied
valkey-cli --cluster del-nodefor unreachable nodesNeeds attention
These candidates could not be applied automatically and need a maintainer to follow up.
4 candidate(s)
Generated by valkey-ci-agent using Claude Code.