fix(java): enforce immediate timeouts by avifenesh · Pull Request #5264 · valkey-io/valkey-glide

avifenesh · 2026-01-29T11:48:24Z

Summary

enforce immediate Java-side timeouts and drop timed-out callbacks before JNI conversion
propagate request timeout to async registration and add native timeout markers with late-response cleanup
add timeout integration coverage and avoid false blocking-command timeouts in existing tests

Performance

batch/getset (100B/1KB) runs remained similar or better than baseline; 1KB at 65k/70k with timeout=20s + inflight=50k matched or improved vs 2.2.4

Additional fixes

SpotBugs cleanup: thread-safe temp dir init in glide.ffi.resolvers.NativeUtils and safer concurrency handling in glide.managers.ConnectionManager

Closes #5263

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1210a26b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

java/client/src/main/java/glide/internal/GlideCoreClient.java

avifenesh · 2026-01-29T12:27:12Z

Addressed this: batches now pass the per-batch timeout into async registration, so Java-side orTimeout matches BatchOptions.timeout. CommandManager forwards BaseBatchOptions.timeout, and GlideCoreClient uses it when registering the batch future (fallbacks to client timeout if unset).

avifenesh · 2026-01-29T12:57:03Z

CI failures were due to batch timeout tests expecting while produced . Updated to schedule a timeout that completes with Glide's TimeoutException and only mark native timeout when we set that exception.

avifenesh · 2026-01-29T12:57:14Z

CI failures were due to batch timeout tests expecting glide.api.models.exceptions.TimeoutException while CompletableFuture.orTimeout produced java.util.concurrent.TimeoutException. Updated AsyncRegistry to schedule timeouts that complete with Glide's TimeoutException and only mark native timeout when we set that exception.

java/client/src/main/java/glide/managers/ConnectionManager.java

java/client/src/main/java/glide/internal/AsyncRegistry.java

java/integTest/src/test/java/glide/SharedCommandTests.java

avifenesh · 2026-01-30T02:03:55Z

Addressed in 9a4db9e:

@jduo: Done
@xShinnRyuu on corrId: Leftovers, good catch
@xShinnRyuu on test helper: Done

java/integTest/src/test/java/glide/TimeoutBehaviorTests.java

avifenesh · 2026-01-30T02:16:08Z

Done

java/integTest/src/test/java/glide/TimeoutBehaviorTests.java

shohamazon

Few comments, whats the most concerning is the blocking commands issue

java/client/src/main/java/glide/internal/AsyncRegistry.java

java/client/src/main/java/glide/internal/GlideCoreClient.java

java/client/src/main/java/glide/managers/ConnectionManager.java

java/integTest/src/test/java/glide/SharedCommandTests.java

java/integTest/src/test/java/glide/TimeoutBehaviorTests.java

avifenesh · 2026-02-01T14:25:44Z

@shohamazon All review comments have been addressed. Here's a summary of the changes:

Key Fixes

1. Cancellable Timeouts (Main Concern)

Replaced delayedExecutor with ScheduledExecutorService + ScheduledFuture:

Timeout tasks are now properly tracked and cancelled when requests complete
No more task accumulation - cancelled tasks are removed from the scheduler
Added proper cleanup in shutdown() and reset() methods

2. Blocking Commands (Critical Bug Fix)

Fixed GlideClusterClient.wait() and waitaof() to use submitBlockingCommand instead of submitNewCommand:

These commands have their own built-in timeouts that Rust handles
The Java-side timeout was incorrectly being applied, causing premature timeouts
This was the root cause of the wait_timeout_check cluster test failures

3. Exception Handling

Changed ConnectionManager to use GlideException hierarchy instead of generic RuntimeException

4. Dead Code Removal

Removed unused internal method overloads:

3-arg register() method in AsyncRegistry
2-arg executeBatchAsync() in GlideCoreClient

5. Reverted Unnecessary Changes

Reverted SharedCommandTests.java to align with main - the blocking command tests work correctly with the default 250ms timeout because submitBlockingCommand skips Java-side timeouts

Test Results

2743 tests passed, 1 infrastructure failure (cluster connection refused - unrelated to code), 12 skipped

avifenesh · 2026-02-01T14:27:16Z

Note: These comments were written by Claude Opus 4.5 and approved by Avi.

Drop timed-out callbacks before JNI conversion and keep Java timeouts immediate. Add timeout behavior test and adjust blocking timeout coverage.\n\nRefs valkey-io#5263 Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

shohamazon · 2026-02-02T09:50:46Z

Minor comments, mostly doc related One question I have is do we want all those blocking commands to be handled by the core or do we want to extract the timeout?
@shohamazon The timeouts should be handled by the wrappers, since the timeouts should be returned on time, this is the contract with the user. It doesn't matter what the core says; until it returns to the user it may take time.

But if you're running blocking commands with no timeout (in java side), it is not handled by the wrappers
@avifenesh

Add 5 comprehensive tests for Java-side timeout functionality: 1. request_completes_before_timeout - verifies normal operation 2. request_exceeds_timeout_throws_exception - validates TimeoutException 3. zero_timeout_uses_rust_default - confirms fallback to Rust timeout 4. timeout_task_cancelled_on_normal_completion - ensures cleanup works 5. different_clients_different_timeouts - tests isolated configurations Tests cover both standalone and cluster modes (9 total test cases). Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

Address shohamazon's review comments about removed documentation: - Restore class-level javadoc with responsibilities list - Add field documentation for activeFutures, timeoutTasks, clientInflightCounts - Document timeoutScheduler with its cancellation behavior - Improve register() method documentation - Add docs to all helper methods (enforceInflightLimit, scheduleTimeout, etc.) - Document completeCallback and completeCallbackWithErrorCode with params - Add inline comments explaining cancel(false) vs cancel(true) usage - Shutdown now terminates timeoutScheduler with shutdownNow() Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

Address review comment about adding && !=0 check. Explain in javadoc why we check block != null rather than block != null && block != 0: BLOCK 0 means "block indefinitely" in Valkey/Redis, which is still a blocking command that should skip Java-side timeout enforcement. Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

avifenesh · 2026-02-02T10:01:27Z

Minor comments, mostly doc related One question I have is do we want all those blocking commands to be handled by the core or do we want to extract the timeout?
@shohamazon The timeouts should be handled by the wrappers, since the timeouts should be returned on time, this is the contract with the user. It doesn't matter what the core says; until it returns to the user it may take time.

But if you're running blocking commands with no timeout (in java side), it is not handled by the wrappers @avifenesh

@shohamazon by design, they should block forever.

avifenesh · 2026-02-02T10:05:17Z

Addressing Review Comments - Documentation & Design Decisions

AsyncRegistry Documentation (Commits `65b20d907`, `9434535ee`)

All documentation has been restored and enhanced:

Class-level javadoc - Restored responsibilities list, updated to include "Schedule optional Java-side timeouts with cancellable tasks"
Field documentation - All fields now documented:
- activeFutures - Thread-safe storage for active futures
- timeoutTasks - Scheduled timeout tasks mapped by correlation ID for cancellation
- clientInflightCounts - Per-client inflight request counters
- timeoutScheduler - Single-threaded scheduler with daemon thread
Method documentation - All methods documented including:
- register() - Full javadoc with @param tags
- scheduleTimeout() - Explains cancellable timeout behavior
- setupCleanup() - Explains atomic cleanup to avoid races
- completeCallback() / completeCallbackWithErrorCode() - Full javadoc
timeoutScheduler.shutdownNow() - Added to shutdown() method
cancel(true) vs cancel(false) - Added inline comments explaining:
- cancel(false) for timeout tasks - don't interrupt scheduler thread
- cancel(true) for user futures - may be blocked waiting

Blocking Commands Design Decision

Question: Should blocking commands use the command's timeout argument for Java-side timeout?

Answer: No, and here's why:

Server handles the timeout: Commands like BLPOP key 5 tell the server to block for 5 seconds. The server returns when:
- Data becomes available, OR
- The timeout expires
Java-side timeout would interfere: If we applied a Java-side 500ms timeout to BLPOP key 5, the Java client would timeout after 500ms even though the server was correctly waiting for 5 seconds.
BLOCK 0 means block forever: This is intentional behavior - the user wants to wait indefinitely until data arrives. Adding Java-side timeout would break this contract.
Rust core handles blocking correctly: The Rust core already has proper timeout handling for blocking commands based on the command arguments.

The isBlocking() check (Commit fbb33e3d7):

// We check block != null, NOT block != null && block != 0
// Because BLOCK 0 = "block indefinitely" (still a blocking command!)
public boolean isBlocking() {
    return this.block != null;
}

Summary of Documentation Changes vs Main

Change	Reason
Updated "Timeouts handled by Rust"	Now Java also handles timeouts
Removed "Provide batched completion helpers"	Not accurate for new implementation
Added timeout-related docs	New functionality
Removed "Rust handles all timeout logic" comment	No longer true

No documentation was unnecessarily removed - all changes reflect the new timeout functionality.

avifenesh · 2026-02-02T10:05:30Z

@shohamazon To clarify the blocking command timeout flow:

For blocking commands (BLPOP, BRPOP, XREAD BLOCK, etc.):

User calls client.blpop(keys, 5.0) → Server blocks for up to 5 seconds
User calls client.blpop(keys, 0) → Server blocks forever until data arrives
Java-side timeout is skipped intentionally via submitBlockingCommand()

Why skip Java-side timeout for blocking commands?

The timeout is in the command itself - the server knows when to return. If we added Java-side timeout:

BLPOP key 30 with 5s Java timeout → Java times out at 5s, but user expected 30s wait
BLPOP key 0 with any Java timeout → Breaks "block forever" behavior

The contract:

Non-blocking commands: Java-side timeout ensures quick failure if server is slow
Blocking commands: Server-side timeout (from command args) is the contract with the user

This matches how other Redis/Valkey clients handle blocking commands - they let the server control the timeout.

shohamazon

small comments, overall LGTM 🙂

java/integTest/src/test/java/glide/TimeoutTests.java

avifenesh · 2026-02-02T11:18:43Z

Addressing Test Review Comments

Changes Made (Commit `721f04d98`)

1. Removed unnecessary cleanup loop ✅

Removed the manual key deletion loop - test teardown handles it

2. Added blocking command test ✅

New blocking_command_uses_server_timeout_not_java_timeout test
Creates client with 200ms Java timeout
Runs BLPOP with 1s server timeout
Verifies it waits ~1 second (server timeout), NOT 200ms (Java timeout)
Asserts result is null (key doesn't exist)

3. Removed non-deterministic test ✅

Removed zero_timeout_uses_rust_default (didn't test anything meaningful)
Removed rapid_requests_do_not_leak_timeout_tasks (non-deterministic assumption)
Removed different_clients_different_timeouts (redundant)

4. Added deterministic cleanup test ✅

Added getPendingTimeoutCount() and getActiveFutureCount() to AsyncRegistry
New timeout_tasks_cleaned_up_after_completion test:
- Records initial counts before operations
- Runs 50 SET/GET pairs
- Waits 100ms for async cleanup
- Asserts counts return to initial values (no leaks)

Final Test Suite (4 tests, each runs standalone + cluster mode = 8 total)

request_completes_before_timeout - Normal operation
request_exceeds_timeout_throws_exception - TimeoutException on slow commands
blocking_command_uses_server_timeout_not_java_timeout - BLPOP uses server timeout
timeout_tasks_cleaned_up_after_completion - Deterministic leak detection

- Replace weak zero_timeout_uses_rust_default test with meaningful blocking_command_uses_server_timeout_not_java_timeout test - Remove non-deterministic rapid_requests_do_not_leak_timeout_tasks test - Remove redundant different_clients_different_timeouts test - Remove unnecessary cleanup loop (test teardown handles it) Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

- Add getPendingTimeoutCount() and getActiveFutureCount() to AsyncRegistry for test observability - Add blocking_command_uses_server_timeout_not_java_timeout test verifying BLPOP uses server-side timeout, not Java-side - Add timeout_tasks_cleaned_up_after_completion test using registry methods to verify no leaked timeout tasks - Remove non-deterministic and redundant tests per review feedback Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: Shoham Elias <shohame@amazon.com>

* feat(Java): Implement server management acl commands (#5132) * Implement server management acl commands Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com> * Updated tests Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com> * Adding tests for acl load and acl save Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com> * Addressing review comments Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com> * Applying spotlessApply changes Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com> --------- Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com> Co-authored-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com> * [Node] Fix to handle non-string types in toBuffersArray (#5166) Fix to handle non-string types in toBuffersArray Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * Make sure we handle IPV6 properly when extracting host and port. (#5104) Signed-off-by: Sylvain Royer <sylvain.royer@smartnews.com> * Update ffi to support register and unregister of pubsub callback post connection (#5178) * Update ffi to support register and unregister of pubsub callback post connection Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com> * fmt Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com> * Run clippy Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com> * fmt Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com> * Fix test Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com> * Fix for wrong pass error type handling Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com> * fmt Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com> --------- Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com> * Python: Fix flaky pubsub tests, fix black lint (#5180) * fixed sync cleanup Signed-off-by: Lior Sventitzky <liorsve@amazon.com> * fixed config interval test, increased workflow timeout Signed-off-by: Lior Sventitzky <liorsve@amazon.com> * adjested lint to new black version Signed-off-by: Lior Sventitzky <liorsve@amazon.com> * fixed interval test Signed-off-by: Lior Sventitzky <liorsve@amazon.com> --------- Signed-off-by: Lior Sventitzky <liorsve@amazon.com> * Core: Fix topology refresh reconnection issue when using refreshTopologyFromInitialNodes (#5155) --------- Signed-off-by: Shoham Elias <shohame@amazon.com> * Add CLAUDE.md for AI agent context (#5197) - Hard constraints section (non-negotiable rules upfront) - Rules grouped by trigger (always, when writing, before push, before PR) - Project structure and architecture overview - Context retrieval with triggers, start-with, and depends-on for just-in-time RAG Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> * Re-enable tests that were skipped due to issue #2277 (#5208) * Fix: Remove DEFAULT_CLIENT_CREATION_TIMEOUT and honor user-provided connection timeout by centralizing timeout logic in ConnectionRequest (#5198) * Core: Fix unnecessary unwrap() warning in test utilities (#5214) Signed-off-by: James Duong <duong.james@gmail.com> * Core: Fix unnecessary unwrap() warning in connection.rs (#5215) - Replace `is_some()` check followed by `unwrap()` with `if let Some()` pattern matching Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com> * Python: Add inflight request limit support to sync client (#5201) Extends the FFI layer and Python sync client to support the inflight_requests_limit configuration parameter, bringing feature parity with the async client. Changes: - FFI: Add reserve/release inflight request checks in command() - Python sync config: Add inflight_requests_limit parameter to GlideClientConfiguration and GlideClusterClientConfiguration - Tests: Add comprehensive tests at FFI and Python layers - FFI: test_inflight_request_limit_sync_client verifies config passing - Python: test_sync_inflight_request_limit with 12 test combinations (3 limits × 2 cluster modes × 2 protocols) The inflight request limit prevents memory exhaustion and server overload by restricting the number of concurrent in-flight requests. When the limit is exceeded, commands return immediately with a "Reached maximum inflight requests" error. Signed-off-by: James Duong <duong.james@gmail.com> * Python: Add OpenTelemetry support to sync client (#5204) Adds OpenTelemetry support to the Python sync client, bringing it to feature parity with the async client. Includes comprehensive refactoring to share configuration classes and test utilities between async and sync implementations. ## Changes ### Core Implementation - Added opentelemetry.py module with OpenTelemetry singleton class for both async and sync clients - Implemented span creation in _execute_command() and _execute_batch() methods - Uses FFI create_named_otel_span() and create_batch_otel_span() functions - Proper span cleanup with try/finally blocks - Added runtime sampling control via get_sample_percentage() and set_sample_percentage() static methods ### Code Reuse & Refactoring - **Shared Configuration**: Moved OpenTelemetryConfig, OpenTelemetryTracesConfig, and OpenTelemetryMetricsConfig to glide_shared module - **Async Client Refactoring**: Added PyO3 conversion layer (_convert_to_pyo3_config()) to transform shared config to Rust FFI types at the boundary - **Simplified API**: Both async and sync clients now use identical public APIs for OpenTelemetry configuration - **Consolidated Test Utilities**: Created otel_test_utils.py with shared helper functions (read_and_parse_span_file, check_spans_ready, build_timeout_error) ### Documentation - Added OpenTelemetry section to README.md with configuration examples ## Migration Notes - Existing async client code continues to work without changes - Both clients now share the same configuration classes from glide_shared.opentelemetry - OpenTelemetry can be initialized once per process and used by both async and sync clients Signed-off-by: James Duong <duong.james@gmail.com> * Set default route for CLIENT LIST to be Random (#5234) Signed-off-by: Maayan Shani <maayan.shani@mail.huji.ac.il> * Fix the default connection timeout for test usage to be 10000ms (#5236) * Fix the default connection timeout for test usage to be 10000ms (10 seconds) Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * trigger CI Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> --------- Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * Enhance pull request template with additional sections (#5171) * Enhance pull request template with additional sections Added sections for summary, issue link, features, implementation, limitations, and testing to the pull request template. Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com> * Update .github/pull_request_template.md Co-authored-by: Taylor Curran <taylor.curran@improving.com> Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com> --------- Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com> Co-authored-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com> Co-authored-by: Taylor Curran <taylor.curran@improving.com> * Node: Migrate NAPI-RS from v2 to v3 (#5203) * Node: Migrate NAPI-RS from v2 to v3 Migrate the Node.js client from NAPI-RS v2 to v3, including both the Rust crate and CLI tooling. Rust crate changes (napi 2 → 3.5): - Type renames: JsUnknown → Unknown, JsObject → Object - Function signatures: Env → &'a Env for lifetime-bound returns - API changes: env.get_null() → Null.into_unknown(&env) - Deprecated APIs replaced: create_buffer_with_data() → BufferSlice::from_data() - Removed compat-mode by migrating to_unknown() → into_unknown(&env) CLI changes (@napi-rs/cli 2 → 3.5.1): - Config: napi.name → napi.binaryName, napi.triples → napi.targets - Build flags: --zig --zig-abi-suffix=2.17 → --use-napi-cross (GNU) - Build flags: --zig → --cross-compile (musl) - Node.js requirement: >=16 → >=18 Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> * Update Node.js version requirement to 18 or higher Signed-off-by: Avi Fenesh <55848801+avifenesh@users.noreply.github.com> --------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: Avi Fenesh <55848801+avifenesh@users.noreply.github.com> * perf: Reduce mutex contention and avoid batch clone (#5230) * perf: Reduce mutex contention and avoid batch clone Two performance improvements: 1. Lock Optimization (glide-core cluster_async) Release mutex immediately after mem::take() instead of holding it during the entire request processing loop. This eliminates contention when multiple clients share the tokio runtime. Before: Mutex held while iterating and spawning futures After: Mutex released immediately after draining the queue 2. Clone Removal (java executeBatchAsync) Take ownership of batch instead of cloning it before the async spawn. For large batches, this avoids expensive deep clones of command data. Before: let batch_clone = batch.clone(); // Expensive for large batches After: Move batch directly into the async block Both changes are safe: - Lock optimization: mem::take atomically moves all requests out - Clone removal: batch is consumed by the async block anyway Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> * Clean up verbose comments Signed-off-by: Ubuntu <ubuntu@ip-172-31-25-236.us-east-2.compute.internal> Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> * fix: Use expect() for mutex lock consistency Address Copilot review comment - use .expect(MUTEX_WRITE_ERR) instead of if let Ok() for consistency with line 3079 and the rest of the codebase. Mutex poisoning should not be silently ignored. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> * perf(java): Optimize UTF-8 string decoding Replace decode().toString() with new String(bytes, UTF_8) for simpler and more consistent decoding. Benchmarks show this is equivalent in performance while being cleaner code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> * fix(java): Ensure consistent byte order for direct buffer decoding Set explicit BIG_ENDIAN byte order on duplicated buffer to ensure consistent behavior across platforms. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> --------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: Ubuntu <ubuntu@ip-172-31-25-236.us-east-2.compute.internal> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Fix scriptKill_unkillable test with waitForNotBusy to prevent connection refused error (#5237) Fix scriptKill_unkillable test with waitForNotBusy to prevent connection timeout Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * Core: improve topology refresh reliability and handle ReadOnly errors (cherry-pick) (#5242) --------- Signed-off-by: Shoham Elias <shohame@amazon.com> * [Backport from 2.2] Java: add topology periodic checks config (#5229) (#5247) Java: add topology periodic checks config (#5229) Signed-off-by: Shoham Elias <shohame@amazon.com> * Enable Windows integration test in workflow through WSL (#5112) - Add x86_64-pc-windows-msvc target to install-engine workflow - Configure WSL (Windows Subsystem for Linux) for Windows CI runners - Update shell execution to use wsl-bash for Windows targets - Fix environment variable passing with WSLENV for cross-platform compatibility - Add WSL system configuration for cluster mode (vm.overcommit_memory, transparent_hugepage) - Update Valkey installation verification to use absolute paths - Enable engine installation on Windows by removing OS exclusion - Update cache key generation to use step outputs instead of env variables - Add parallel build flag (-j4) to Valkey make command for faster compilation - Update Java CD workflow to support Windows builds - Modify integration tests to work with Windows environment - Update cluster manager and test utilities for cross-platform compatibility Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com> * Core: Disable aws-lc-rs CPU jitter entropy to fix TLS connection latency regression (#5223) The aws-lc-rs library (used by rustls for TLS) introduced CPU jitter entropy as the default entropy source in v1.14.1. This causes ~3x slower TLS connection setup (~280ms vs ~90ms). Since Cargo.lock is gitignored, each CI build resolves the latest aws-lc-rs version, causing a performance regression starting with packages built after aws-lc-rs 1.14.1 was released. The fix adds AWS_LC_SYS_NO_JITTER_ENTROPY=1 to the root .cargo/config.toml, which is inherited by all client builds and disables jitter entropy at compile time. This falls back to OS entropy sources (/dev/urandom, getrandom, RDRAND) which are sufficient for cryptographic purposes. Reference: aws/aws-lc-rs#899 Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com> * Add missing CHANGELOG for java internal statistics support (#5251) * Add missing CHANGELOG for java internal statistics support Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * Update CHANGELOG.md Co-authored-by: James Duong <duong.james@gmail.com> Signed-off-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com> --------- Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> Signed-off-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com> Co-authored-by: James Duong <duong.james@gmail.com> * Pin usage of CodeQL 2.23.9 to prevent Rust analyzer hanging (#5268) Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * Fix fcall_readonly_function flaky test by removing unreliable wait assertion and adding in retry loop (#5246) * Fix fcall_readonly_function flaky test by removing unreliable WAIT assertion and adding in retry loop Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * Address feedback, lower poll time and increase retry count Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> --------- Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * Reduce upper inflight limit for Python from 1500 to 500 (#5266) Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * Java: Add Windows setup instructions for GLIDE Java development (#5253) docs(java): Add Windows setup instructions for GLIDE Java development - Add note about WSL requirement for Windows users at the top of dependencies section - Add Windows dependencies installation section with two options (winget and Chocolatey) - Include detailed WSL installation and Valkey setup instructions for Windows users - Add Windows-specific protoc installation instructions with PowerShell commands - Clarify platform-specific PATH configuration notes for Linux/MacOS vs Windows - Improve documentation clarity by adding "For Linux-x86_64:" label to existing protoc instructions - Update PATH persistence notes to reflect platform differences Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com> * Reduce upper inflight limit for Python from 500 to 250 and increase blocking time (#5278) Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * feat(java): add support for KEYS, MIGRATE, and WAITAOF commands (#5107) * fix(java): enforce immediate timeouts (#5264) --------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> * Drop support for Node.js 16.x and 18.x. Minimum supported version is now Node.js 20.x. (#5292) Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> * Refresh AWS credentials inside IAM token manager (#5282) Signed-off-by: Maayan Shani <maayan.shani@mail.huji.ac.il> * Core: parallelize DNS lookups during slot refresh (#5281) --------- Signed-off-by: Shoham Elias <shohame@amazon.com> * Python: Add dynamic PubSub support to sync client (#5272) Implements dynamic PubSub functionality for the Python sync client, achieving feature parity with the async client. This allows sync users to dynamically subscribe/unsubscribe to channels at runtime and monitor subscription health. Update the Rust FFI layer to report new pubsub statistics. Unify config.py classes since there aren't differences in support between sync and async. Support the pubsub_reconciliation_interval_ms option. Note that lazy subscription requests are not supported by design for the sync client. Signed-off-by: James Duong <duong.james@gmail.com> * Go: Add ALLOW_NON_COVERED_SLOTS to cluster scan (#5277) * Go: Support ALLOW_NON_COVERED_SLOTS flag - Support scanning even if some slots are not covered - Add test to verify that using an invalid cursor ID throws an error - Add test to verify that terminating a cursor early does not leak memory Signed-off-by: James Duong <duong.james@gmail.com> * Go: Update CHANGELOG and README for cluster scan AllowNonCoveredSlots option - Add entry to CHANGELOG.md for ALLOW_NON_COVERED_SLOTS flag support - Add cluster scan documentation section to go/README.md with examples - Document the new SetAllowNonCoveredSlots() option and its use case Signed-off-by: James Duong <duong.james@gmail.com> * Go: Fix linter formatting issues - Fix field alignment in ClusterScanOptions struct - Remove extra blank line in test file Signed-off-by: James Duong <duong.james@gmail.com> --------- Signed-off-by: James Duong <duong.james@gmail.com> * Go: Support statistics and dynamic pubsub (#5280) Implement support for dynamic pubsub commands and retrieval of statistics, inlcuding pubsub statistics. Add support for setting the pubsub reconciliation interval. Closes #5254 Signed-off-by: James Duong <duong.james@gmail.com> * Java: Add dynamic pubsub APIs and pubsub stats (#5269) * Add support for dynamic subscription and unsubscription in Java. * Add methods for retrieving subscription metrics. * Add the pubsub reconciliation interval advanced option. Fixes #5267. Signed-off-by: James Duong <duong.james@gmail.com> * Update default connectionTImeout for Java test client from 2000ms to 10000ms (#5309) Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> --------- Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com> Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com> Signed-off-by: Sylvain Royer <sylvain.royer@smartnews.com> Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com> Signed-off-by: Lior Sventitzky <liorsve@amazon.com> Signed-off-by: Shoham Elias <shohame@amazon.com> Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: James Duong <duong.james@gmail.com> Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com> Signed-off-by: Maayan Shani <maayan.shani@mail.huji.ac.il> Signed-off-by: Avi Fenesh <55848801+avifenesh@users.noreply.github.com> Signed-off-by: Ubuntu <ubuntu@ip-172-31-25-236.us-east-2.compute.internal> Signed-off-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com> Co-authored-by: Sasidharan3094 <sasidharan.gopal94@gmail.com> Co-authored-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com> Co-authored-by: Sylvain Royer <Sylvain-Royer@users.noreply.github.com> Co-authored-by: Lior Sventitzky <liorsve@amazon.com> Co-authored-by: Shoham Elias <116083498+shohamazon@users.noreply.github.com> Co-authored-by: Avi Fenesh <55848801+avifenesh@users.noreply.github.com> Co-authored-by: James Duong <duong.james@gmail.com> Co-authored-by: affonsov <67347924+affonsov@users.noreply.github.com> Co-authored-by: Maayan Shani <161942026+Maayanshani25@users.noreply.github.com> Co-authored-by: Taylor Curran <taylor.curran@improving.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

This reverts commit 512feef.

This reverts commit 512feef. Signed-off-by: Shoham Elias <shohame@amazon.com>

* Revert "fix(java): enforce immediate timeouts (#5264)" This reverts commit 512feef. Signed-off-by: Shoham Elias <shohame@amazon.com> * Revert "perf: Reduce mutex contention and avoid batch clone (#5230)" This reverts commit f7bf94c. Signed-off-by: Shoham Elias <shohame@amazon.com> * changelog Signed-off-by: Shoham Elias <shohame@amazon.com> --------- Signed-off-by: Shoham Elias <shohame@amazon.com>

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: Shoham Elias <shohame@amazon.com>

avifenesh requested a review from a team as a code owner January 29, 2026 11:48

chatgpt-codex-connector bot reviewed Jan 29, 2026

View reviewed changes

java/client/src/main/java/glide/internal/GlideCoreClient.java Outdated Show resolved Hide resolved

avifenesh requested review from affonsov, ikolomi, jamesx-improving, jbrinkman, jonathanl-bq, shohamazon and xShinnRyuu and removed request for jonathanl-bq January 29, 2026 12:32

xShinnRyuu requested a review from jduo January 29, 2026 23:04

jduo reviewed Jan 29, 2026

View reviewed changes

java/client/src/main/java/glide/managers/ConnectionManager.java Outdated Show resolved Hide resolved

xShinnRyuu reviewed Jan 30, 2026

View reviewed changes

java/client/src/main/java/glide/internal/AsyncRegistry.java Outdated Show resolved Hide resolved

xShinnRyuu reviewed Jan 30, 2026

View reviewed changes

java/integTest/src/test/java/glide/SharedCommandTests.java Outdated Show resolved Hide resolved

xShinnRyuu reviewed Jan 30, 2026

View reviewed changes

java/integTest/src/test/java/glide/TimeoutBehaviorTests.java Outdated Show resolved Hide resolved

avifenesh force-pushed the java-timeout-fix branch from 6c6053d to 9a4db9e Compare January 30, 2026 02:05

jduo reviewed Jan 30, 2026

View reviewed changes

java/integTest/src/test/java/glide/TimeoutBehaviorTests.java Outdated Show resolved Hide resolved

shohamazon requested changes Feb 1, 2026

View reviewed changes

avifenesh force-pushed the java-timeout-fix branch 2 times, most recently from 24ae842 to 1237006 Compare February 1, 2026 14:29

avifenesh added 3 commits February 1, 2026 16:30

fix(java): enforce immediate timeouts

87b18b4

Drop timed-out callbacks before JNI conversion and keep Java timeouts immediate. Add timeout behavior test and adjust blocking timeout coverage.\n\nRefs valkey-io#5263 Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

style(java): apply spotless

3214b9e

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

fix(java): honor batch timeout

e05c18d

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

avifenesh force-pushed the java-timeout-fix branch from 0cd111e to e1e391d Compare February 2, 2026 09:54

avifenesh added 2 commits February 2, 2026 11:55

avifenesh force-pushed the java-timeout-fix branch from e1e391d to 9434535 Compare February 2, 2026 09:55

avifenesh mentioned this pull request Feb 2, 2026

Java: Command Timeouts not getting honoured intermittently #5284

Open

shohamazon approved these changes Feb 2, 2026

View reviewed changes

avifenesh added 2 commits February 2, 2026 13:21

avifenesh force-pushed the java-timeout-fix branch from 721f04d to db96bf9 Compare February 2, 2026 11:21

shohamazon merged commit ed079d8 into valkey-io:main Feb 2, 2026
82 of 90 checks passed

shohamazon pushed a commit to shohamazon/glide-for-redis that referenced this pull request Feb 2, 2026

fix(java): enforce immediate timeouts (valkey-io#5264)

2fafff3

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

shohamazon pushed a commit to shohamazon/glide-for-redis that referenced this pull request Feb 2, 2026

fix(java): enforce immediate timeouts (valkey-io#5264)

0ac51b3

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: Shoham Elias <shohame@amazon.com>

shohamazon pushed a commit to shohamazon/glide-for-redis that referenced this pull request Feb 2, 2026

fix(java): enforce immediate timeouts (valkey-io#5264)

7036517

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: Shoham Elias <shohame@amazon.com>

shohamazon pushed a commit to shohamazon/glide-for-redis that referenced this pull request Feb 3, 2026

fix(java): enforce immediate timeouts (valkey-io#5264)

d554a26

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: Shoham Elias <shohame@amazon.com>

shohamazon pushed a commit to shohamazon/glide-for-redis that referenced this pull request Feb 3, 2026

fix(java): enforce immediate timeouts (valkey-io#5264)

3d193f1

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: Shoham Elias <shohame@amazon.com>

shohamazon pushed a commit that referenced this pull request Feb 3, 2026

fix(java): enforce immediate timeouts (#5264)

512feef

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: Shoham Elias <shohame@amazon.com>

xShinnRyuu linked an issue Feb 3, 2026 that may be closed by this pull request

Java: Command Timeouts not getting honoured intermittently #5284

Open

shohamazon added a commit that referenced this pull request Feb 8, 2026

Revert "fix(java): enforce immediate timeouts (#5264)"

56ef7d7

This reverts commit 512feef.

shohamazon added a commit to shohamazon/glide-for-redis that referenced this pull request Feb 8, 2026

Revert "fix(java): enforce immediate timeouts (valkey-io#5264)"

80fe6fc

This reverts commit 512feef.

shohamazon added a commit to shohamazon/glide-for-redis that referenced this pull request Feb 8, 2026

Revert "fix(java): enforce immediate timeouts (valkey-io#5264)"

0170439

This reverts commit 512feef. Signed-off-by: Shoham Elias <shohame@amazon.com>

Kaushik-Vijayakumar pushed a commit to Kaushik-Vijayakumar/valkey-glide that referenced this pull request Feb 17, 2026

fix(java): enforce immediate timeouts (valkey-io#5264)

8820d59

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

jduo pushed a commit to jduo/valkey-glide that referenced this pull request Feb 28, 2026

fix(java): enforce immediate timeouts (valkey-io#5264)

39663fa

--------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> Signed-off-by: Shoham Elias <shohame@amazon.com>

Conversation

avifenesh commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance

Additional fixes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

avifenesh commented Jan 29, 2026

Uh oh!

avifenesh commented Jan 29, 2026

Uh oh!

avifenesh commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

avifenesh commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

avifenesh commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shohamazon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

avifenesh commented Feb 1, 2026

Key Fixes

1. Cancellable Timeouts (Main Concern)

2. Blocking Commands (Critical Bug Fix)

3. Exception Handling

4. Dead Code Removal

5. Reverted Unnecessary Changes

Test Results

Uh oh!

avifenesh commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shohamazon commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avifenesh commented Feb 2, 2026

Uh oh!

avifenesh commented Feb 2, 2026

Addressing Review Comments - Documentation & Design Decisions

AsyncRegistry Documentation (Commits 65b20d907, 9434535ee)

Blocking Commands Design Decision

Summary of Documentation Changes vs Main

Uh oh!

avifenesh commented Feb 2, 2026

Uh oh!

shohamazon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

avifenesh commented Feb 2, 2026

Addressing Test Review Comments

Changes Made (Commit 721f04d98)

Final Test Suite (4 tests, each runs standalone + cluster mode = 8 total)

Uh oh!

Uh oh!

Reviewers

Assignees

avifenesh commented Jan 29, 2026 •

edited

Loading

avifenesh commented Jan 30, 2026 •

edited

Loading

avifenesh commented Jan 30, 2026 •

edited

Loading

avifenesh commented Feb 1, 2026 •

edited

Loading

shohamazon commented Feb 2, 2026 •

edited

Loading

AsyncRegistry Documentation (Commits `65b20d907`, `9434535ee`)

Changes Made (Commit `721f04d98`)