Skip to content

fix(java): enforce immediate timeouts#5264

Merged
shohamazon merged 17 commits intovalkey-io:mainfrom
avifenesh:java-timeout-fix
Feb 2, 2026
Merged

fix(java): enforce immediate timeouts#5264
shohamazon merged 17 commits intovalkey-io:mainfrom
avifenesh:java-timeout-fix

Conversation

@avifenesh
Copy link
Copy Markdown
Member

@avifenesh avifenesh commented Jan 29, 2026

Summary

  • enforce immediate Java-side timeouts and drop timed-out callbacks before JNI conversion
  • propagate request timeout to async registration and add native timeout markers with late-response cleanup
  • add timeout integration coverage and avoid false blocking-command timeouts in existing tests

Performance

  • batch/getset (100B/1KB) runs remained similar or better than baseline; 1KB at 65k/70k with timeout=20s + inflight=50k matched or improved vs 2.2.4

Additional fixes

  • SpotBugs cleanup: thread-safe temp dir init in glide.ffi.resolvers.NativeUtils and safer concurrency handling in glide.managers.ConnectionManager

Closes #5263

@avifenesh avifenesh requested a review from a team as a code owner January 29, 2026 11:48
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1210a26b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@avifenesh
Copy link
Copy Markdown
Member Author

Addressed this: batches now pass the per-batch timeout into async registration, so Java-side orTimeout matches BatchOptions.timeout. CommandManager forwards BaseBatchOptions.timeout, and GlideCoreClient uses it when registering the batch future (fallbacks to client timeout if unset).

@avifenesh
Copy link
Copy Markdown
Member Author

CI failures were due to batch timeout tests expecting while produced . Updated to schedule a timeout that completes with Glide's TimeoutException and only mark native timeout when we set that exception.

@avifenesh
Copy link
Copy Markdown
Member Author

CI failures were due to batch timeout tests expecting glide.api.models.exceptions.TimeoutException while CompletableFuture.orTimeout produced java.util.concurrent.TimeoutException. Updated AsyncRegistry to schedule timeouts that complete with Glide's TimeoutException and only mark native timeout when we set that exception.

@xShinnRyuu xShinnRyuu requested a review from jduo January 29, 2026 23:04
@avifenesh
Copy link
Copy Markdown
Member Author

avifenesh commented Jan 30, 2026

Addressed in 9a4db9e:

@avifenesh
Copy link
Copy Markdown
Member Author

avifenesh commented Jan 30, 2026

Done

Copy link
Copy Markdown
Collaborator

@shohamazon shohamazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, whats the most concerning is the blocking commands issue

@avifenesh
Copy link
Copy Markdown
Member Author

@shohamazon All review comments have been addressed. Here's a summary of the changes:

Key Fixes

1. Cancellable Timeouts (Main Concern)

Replaced delayedExecutor with ScheduledExecutorService + ScheduledFuture:

  • Timeout tasks are now properly tracked and cancelled when requests complete
  • No more task accumulation - cancelled tasks are removed from the scheduler
  • Added proper cleanup in shutdown() and reset() methods

2. Blocking Commands (Critical Bug Fix)

Fixed GlideClusterClient.wait() and waitaof() to use submitBlockingCommand instead of submitNewCommand:

  • These commands have their own built-in timeouts that Rust handles
  • The Java-side timeout was incorrectly being applied, causing premature timeouts
  • This was the root cause of the wait_timeout_check cluster test failures

3. Exception Handling

Changed ConnectionManager to use GlideException hierarchy instead of generic RuntimeException

4. Dead Code Removal

Removed unused internal method overloads:

  • 3-arg register() method in AsyncRegistry
  • 2-arg executeBatchAsync() in GlideCoreClient

5. Reverted Unnecessary Changes

Reverted SharedCommandTests.java to align with main - the blocking command tests work correctly with the default 250ms timeout because submitBlockingCommand skips Java-side timeouts

Test Results

2743 tests passed, 1 infrastructure failure (cluster connection refused - unrelated to code), 12 skipped

@avifenesh
Copy link
Copy Markdown
Member Author

avifenesh commented Feb 1, 2026

Note: These comments were written by Claude Opus 4.5 and approved by Avi.

@avifenesh avifenesh force-pushed the java-timeout-fix branch 2 times, most recently from 24ae842 to 1237006 Compare February 1, 2026 14:29
Drop timed-out callbacks before JNI conversion and keep Java timeouts immediate. Add timeout behavior test and adjust blocking timeout coverage.\n\nRefs valkey-io#5263

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
@shohamazon
Copy link
Copy Markdown
Collaborator

shohamazon commented Feb 2, 2026

Minor comments, mostly doc related One question I have is do we want all those blocking commands to be handled by the core or do we want to extract the timeout?
@shohamazon The timeouts should be handled by the wrappers, since the timeouts should be returned on time, this is the contract with the user. It doesn't matter what the core says; until it returns to the user it may take time.

But if you're running blocking commands with no timeout (in java side), it is not handled by the wrappers
@avifenesh

Add 5 comprehensive tests for Java-side timeout functionality:

1. request_completes_before_timeout - verifies normal operation
2. request_exceeds_timeout_throws_exception - validates TimeoutException
3. zero_timeout_uses_rust_default - confirms fallback to Rust timeout
4. timeout_task_cancelled_on_normal_completion - ensures cleanup works
5. different_clients_different_timeouts - tests isolated configurations

Tests cover both standalone and cluster modes (9 total test cases).

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Address shohamazon's review comments about removed documentation:

- Restore class-level javadoc with responsibilities list
- Add field documentation for activeFutures, timeoutTasks, clientInflightCounts
- Document timeoutScheduler with its cancellation behavior
- Improve register() method documentation
- Add docs to all helper methods (enforceInflightLimit, scheduleTimeout, etc.)
- Document completeCallback and completeCallbackWithErrorCode with params
- Add inline comments explaining cancel(false) vs cancel(true) usage
- Shutdown now terminates timeoutScheduler with shutdownNow()

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Address review comment about adding && !=0 check.

Explain in javadoc why we check block != null rather than
block != null && block != 0: BLOCK 0 means "block indefinitely"
in Valkey/Redis, which is still a blocking command that should
skip Java-side timeout enforcement.

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
@avifenesh
Copy link
Copy Markdown
Member Author

Minor comments, mostly doc related One question I have is do we want all those blocking commands to be handled by the core or do we want to extract the timeout?
@shohamazon The timeouts should be handled by the wrappers, since the timeouts should be returned on time, this is the contract with the user. It doesn't matter what the core says; until it returns to the user it may take time.

But if you're running blocking commands with no timeout (in java side), it is not handled by the wrappers @avifenesh

@shohamazon by design, they should block forever.

@avifenesh
Copy link
Copy Markdown
Member Author

Addressing Review Comments - Documentation & Design Decisions

AsyncRegistry Documentation (Commits 65b20d907, 9434535ee)

All documentation has been restored and enhanced:

  1. Class-level javadoc - Restored responsibilities list, updated to include "Schedule optional Java-side timeouts with cancellable tasks"

  2. Field documentation - All fields now documented:

    • activeFutures - Thread-safe storage for active futures
    • timeoutTasks - Scheduled timeout tasks mapped by correlation ID for cancellation
    • clientInflightCounts - Per-client inflight request counters
    • timeoutScheduler - Single-threaded scheduler with daemon thread
  3. Method documentation - All methods documented including:

    • register() - Full javadoc with @param tags
    • scheduleTimeout() - Explains cancellable timeout behavior
    • setupCleanup() - Explains atomic cleanup to avoid races
    • completeCallback() / completeCallbackWithErrorCode() - Full javadoc
  4. timeoutScheduler.shutdownNow() - Added to shutdown() method

  5. cancel(true) vs cancel(false) - Added inline comments explaining:

    • cancel(false) for timeout tasks - don't interrupt scheduler thread
    • cancel(true) for user futures - may be blocked waiting

Blocking Commands Design Decision

Question: Should blocking commands use the command's timeout argument for Java-side timeout?

Answer: No, and here's why:

  1. Server handles the timeout: Commands like BLPOP key 5 tell the server to block for 5 seconds. The server returns when:

    • Data becomes available, OR
    • The timeout expires
  2. Java-side timeout would interfere: If we applied a Java-side 500ms timeout to BLPOP key 5, the Java client would timeout after 500ms even though the server was correctly waiting for 5 seconds.

  3. BLOCK 0 means block forever: This is intentional behavior - the user wants to wait indefinitely until data arrives. Adding Java-side timeout would break this contract.

  4. Rust core handles blocking correctly: The Rust core already has proper timeout handling for blocking commands based on the command arguments.

The isBlocking() check (Commit fbb33e3d7):

// We check block != null, NOT block != null && block != 0
// Because BLOCK 0 = "block indefinitely" (still a blocking command!)
public boolean isBlocking() {
    return this.block != null;
}

Summary of Documentation Changes vs Main

Change Reason
Updated "Timeouts handled by Rust" Now Java also handles timeouts
Removed "Provide batched completion helpers" Not accurate for new implementation
Added timeout-related docs New functionality
Removed "Rust handles all timeout logic" comment No longer true

No documentation was unnecessarily removed - all changes reflect the new timeout functionality.

@avifenesh
Copy link
Copy Markdown
Member Author

@shohamazon To clarify the blocking command timeout flow:

For blocking commands (BLPOP, BRPOP, XREAD BLOCK, etc.):

  1. User calls client.blpop(keys, 5.0) → Server blocks for up to 5 seconds
  2. User calls client.blpop(keys, 0) → Server blocks forever until data arrives
  3. Java-side timeout is skipped intentionally via submitBlockingCommand()

Why skip Java-side timeout for blocking commands?

The timeout is in the command itself - the server knows when to return. If we added Java-side timeout:

  • BLPOP key 30 with 5s Java timeout → Java times out at 5s, but user expected 30s wait
  • BLPOP key 0 with any Java timeout → Breaks "block forever" behavior

The contract:

  • Non-blocking commands: Java-side timeout ensures quick failure if server is slow
  • Blocking commands: Server-side timeout (from command args) is the contract with the user

This matches how other Redis/Valkey clients handle blocking commands - they let the server control the timeout.

Copy link
Copy Markdown
Collaborator

@shohamazon shohamazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, overall LGTM 🙂

@avifenesh
Copy link
Copy Markdown
Member Author

Addressing Test Review Comments

Changes Made (Commit 721f04d98)

1. Removed unnecessary cleanup loop

  • Removed the manual key deletion loop - test teardown handles it

2. Added blocking command test

  • New blocking_command_uses_server_timeout_not_java_timeout test
  • Creates client with 200ms Java timeout
  • Runs BLPOP with 1s server timeout
  • Verifies it waits ~1 second (server timeout), NOT 200ms (Java timeout)
  • Asserts result is null (key doesn't exist)

3. Removed non-deterministic test

  • Removed zero_timeout_uses_rust_default (didn't test anything meaningful)
  • Removed rapid_requests_do_not_leak_timeout_tasks (non-deterministic assumption)
  • Removed different_clients_different_timeouts (redundant)

4. Added deterministic cleanup test

  • Added getPendingTimeoutCount() and getActiveFutureCount() to AsyncRegistry
  • New timeout_tasks_cleaned_up_after_completion test:
    • Records initial counts before operations
    • Runs 50 SET/GET pairs
    • Waits 100ms for async cleanup
    • Asserts counts return to initial values (no leaks)

Final Test Suite (4 tests, each runs standalone + cluster mode = 8 total)

  1. request_completes_before_timeout - Normal operation
  2. request_exceeds_timeout_throws_exception - TimeoutException on slow commands
  3. blocking_command_uses_server_timeout_not_java_timeout - BLPOP uses server timeout
  4. timeout_tasks_cleaned_up_after_completion - Deterministic leak detection

- Replace weak zero_timeout_uses_rust_default test with meaningful
  blocking_command_uses_server_timeout_not_java_timeout test
- Remove non-deterministic rapid_requests_do_not_leak_timeout_tasks test
- Remove redundant different_clients_different_timeouts test
- Remove unnecessary cleanup loop (test teardown handles it)

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
- Add getPendingTimeoutCount() and getActiveFutureCount() to AsyncRegistry
  for test observability
- Add blocking_command_uses_server_timeout_not_java_timeout test
  verifying BLPOP uses server-side timeout, not Java-side
- Add timeout_tasks_cleaned_up_after_completion test using registry
  methods to verify no leaked timeout tasks
- Remove non-deterministic and redundant tests per review feedback

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
@shohamazon shohamazon merged commit ed079d8 into valkey-io:main Feb 2, 2026
82 of 90 checks passed
shohamazon pushed a commit to shohamazon/glide-for-redis that referenced this pull request Feb 2, 2026
---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
shohamazon pushed a commit to shohamazon/glide-for-redis that referenced this pull request Feb 2, 2026
---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: Shoham Elias <shohame@amazon.com>
shohamazon pushed a commit to shohamazon/glide-for-redis that referenced this pull request Feb 2, 2026
---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: Shoham Elias <shohame@amazon.com>
shohamazon pushed a commit to shohamazon/glide-for-redis that referenced this pull request Feb 3, 2026
---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: Shoham Elias <shohame@amazon.com>
shohamazon pushed a commit to shohamazon/glide-for-redis that referenced this pull request Feb 3, 2026
---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: Shoham Elias <shohame@amazon.com>
shohamazon pushed a commit that referenced this pull request Feb 3, 2026
---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: Shoham Elias <shohame@amazon.com>
@xShinnRyuu xShinnRyuu linked an issue Feb 3, 2026 that may be closed by this pull request
alexr-bq added a commit that referenced this pull request Feb 5, 2026
* feat(Java): Implement server management acl commands (#5132)

* Implement server management acl commands

Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com>

* Updated tests

Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com>

* Adding tests for acl load and acl save

Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com>

* Addressing review comments

Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com>

* Applying spotlessApply changes

Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com>

---------

Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com>
Co-authored-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com>

* [Node] Fix to handle non-string types in toBuffersArray (#5166)

Fix to handle non-string types in toBuffersArray

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* Make sure we handle IPV6 properly when extracting host and port. (#5104)

Signed-off-by: Sylvain Royer <sylvain.royer@smartnews.com>

* Update ffi to support register and unregister of pubsub callback post connection (#5178)

* Update ffi to support register and unregister of pubsub callback post connection

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>

* fmt

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>

* Run clippy

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>

* fmt

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>

* Fix test

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>

* Fix for wrong pass error type handling

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>

* fmt

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>

---------

Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>

* Python: Fix flaky pubsub tests, fix black lint (#5180)

* fixed sync cleanup

Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

* fixed config interval test, increased workflow timeout

Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

* adjested lint to new black version

Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

* fixed interval test

Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

---------

Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

* Core: Fix topology refresh reconnection issue when using refreshTopologyFromInitialNodes (#5155)

---------

Signed-off-by: Shoham Elias <shohame@amazon.com>

* Add CLAUDE.md for AI agent context (#5197)

- Hard constraints section (non-negotiable rules upfront)
- Rules grouped by trigger (always, when writing, before push, before PR)
- Project structure and architecture overview
- Context retrieval with triggers, start-with, and depends-on for just-in-time RAG

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

* Re-enable tests that were skipped due to issue #2277 (#5208)

* Fix: Remove DEFAULT_CLIENT_CREATION_TIMEOUT and honor user-provided connection timeout by centralizing timeout logic in ConnectionRequest (#5198)

* Core: Fix unnecessary unwrap() warning in test utilities (#5214)

Signed-off-by: James Duong <duong.james@gmail.com>

* Core: Fix unnecessary unwrap() warning in connection.rs (#5215)

- Replace `is_some()` check followed by `unwrap()` with `if let Some()` pattern matching

Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com>

* Python: Add inflight request limit support to sync client (#5201)

Extends the FFI layer and Python sync client to support the inflight_requests_limit configuration parameter, bringing feature parity with the async client.

Changes:
- FFI: Add reserve/release inflight request checks in command()
- Python sync config: Add inflight_requests_limit parameter to GlideClientConfiguration and GlideClusterClientConfiguration
- Tests: Add comprehensive tests at FFI and Python layers
  - FFI: test_inflight_request_limit_sync_client verifies config passing
  - Python: test_sync_inflight_request_limit with 12 test combinations (3 limits × 2 cluster modes × 2 protocols)

The inflight request limit prevents memory exhaustion and server overload by restricting the number of concurrent in-flight requests. When the limit is exceeded, commands return immediately with a "Reached maximum inflight requests" error.

Signed-off-by: James Duong <duong.james@gmail.com>

* Python: Add OpenTelemetry support to sync client (#5204)

Adds OpenTelemetry support to the Python sync client, bringing it to feature parity with the async client. Includes
comprehensive refactoring to share configuration classes and test utilities between async and sync implementations.

## Changes

### Core Implementation
- Added opentelemetry.py module with OpenTelemetry singleton class for both async and sync clients
- Implemented span creation in _execute_command() and _execute_batch() methods
- Uses FFI create_named_otel_span() and create_batch_otel_span() functions
- Proper span cleanup with try/finally blocks
- Added runtime sampling control via get_sample_percentage() and set_sample_percentage() static methods

### Code Reuse & Refactoring
- **Shared Configuration**: Moved OpenTelemetryConfig, OpenTelemetryTracesConfig, and OpenTelemetryMetricsConfig to glide_shared module
- **Async Client Refactoring**: Added PyO3 conversion layer (_convert_to_pyo3_config()) to transform shared config to Rust FFI types at the boundary
- **Simplified API**: Both async and sync clients now use identical public APIs for OpenTelemetry configuration
- **Consolidated Test Utilities**: Created otel_test_utils.py with shared helper functions (read_and_parse_span_file,
check_spans_ready, build_timeout_error)

### Documentation
- Added OpenTelemetry section to README.md with configuration examples

## Migration Notes
- Existing async client code continues to work without changes
- Both clients now share the same configuration classes from glide_shared.opentelemetry
- OpenTelemetry can be initialized once per process and used by both async and sync clients

Signed-off-by: James Duong <duong.james@gmail.com>

* Set default route for CLIENT LIST to be Random (#5234)

Signed-off-by: Maayan Shani <maayan.shani@mail.huji.ac.il>

* Fix the default connection timeout for test usage to be 10000ms (#5236)

* Fix the default connection timeout for test usage to be 10000ms (10 seconds)

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* trigger CI

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

---------

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* Enhance pull request template with additional sections (#5171)

* Enhance pull request template with additional sections

Added sections for summary, issue link, features, implementation, limitations, and testing to the pull request template.

Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com>

* Update .github/pull_request_template.md

Co-authored-by: Taylor Curran <taylor.curran@improving.com>
Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com>

---------

Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com>
Co-authored-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com>
Co-authored-by: Taylor Curran <taylor.curran@improving.com>

* Node: Migrate NAPI-RS from v2 to v3 (#5203)

* Node: Migrate NAPI-RS from v2 to v3

Migrate the Node.js client from NAPI-RS v2 to v3, including both the
Rust crate and CLI tooling.

Rust crate changes (napi 2 → 3.5):
- Type renames: JsUnknown → Unknown, JsObject → Object
- Function signatures: Env → &'a Env for lifetime-bound returns
- API changes: env.get_null() → Null.into_unknown(&env)
- Deprecated APIs replaced: create_buffer_with_data() → BufferSlice::from_data()
- Removed compat-mode by migrating to_unknown() → into_unknown(&env)

CLI changes (@napi-rs/cli 2 → 3.5.1):
- Config: napi.name → napi.binaryName, napi.triples → napi.targets
- Build flags: --zig --zig-abi-suffix=2.17 → --use-napi-cross (GNU)
- Build flags: --zig → --cross-compile (musl)
- Node.js requirement: >=16 → >=18

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

* Update Node.js version requirement to 18 or higher

Signed-off-by: Avi Fenesh <55848801+avifenesh@users.noreply.github.com>

---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: Avi Fenesh <55848801+avifenesh@users.noreply.github.com>

* perf: Reduce mutex contention and avoid batch clone (#5230)

* perf: Reduce mutex contention and avoid batch clone

Two performance improvements:

1. Lock Optimization (glide-core cluster_async)
   Release mutex immediately after mem::take() instead of holding it
   during the entire request processing loop. This eliminates contention
   when multiple clients share the tokio runtime.

   Before: Mutex held while iterating and spawning futures
   After:  Mutex released immediately after draining the queue

2. Clone Removal (java executeBatchAsync)
   Take ownership of batch instead of cloning it before the async spawn.
   For large batches, this avoids expensive deep clones of command data.

   Before: let batch_clone = batch.clone(); // Expensive for large batches
   After:  Move batch directly into the async block

Both changes are safe:
- Lock optimization: mem::take atomically moves all requests out
- Clone removal: batch is consumed by the async block anyway

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

* Clean up verbose comments

Signed-off-by: Ubuntu <ubuntu@ip-172-31-25-236.us-east-2.compute.internal>
Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

* fix: Use expect() for mutex lock consistency

Address Copilot review comment - use .expect(MUTEX_WRITE_ERR) instead
of if let Ok() for consistency with line 3079 and the rest of the
codebase. Mutex poisoning should not be silently ignored.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

* perf(java): Optimize UTF-8 string decoding

Replace decode().toString() with new String(bytes, UTF_8) for simpler
and more consistent decoding. Benchmarks show this is equivalent in
performance while being cleaner code.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

* fix(java): Ensure consistent byte order for direct buffer decoding

Set explicit BIG_ENDIAN byte order on duplicated buffer to ensure
consistent behavior across platforms.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-25-236.us-east-2.compute.internal>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Fix scriptKill_unkillable test with waitForNotBusy to prevent connection refused error (#5237)

Fix scriptKill_unkillable test with waitForNotBusy to prevent connection timeout

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* Core: improve topology refresh reliability and handle ReadOnly errors (cherry-pick) (#5242)

---------

Signed-off-by: Shoham Elias <shohame@amazon.com>

* [Backport from 2.2] Java: add topology periodic checks config (#5229) (#5247)

Java: add topology periodic checks config (#5229)

Signed-off-by: Shoham Elias <shohame@amazon.com>

* Enable Windows integration test in workflow through WSL (#5112)

- Add x86_64-pc-windows-msvc target to install-engine workflow
- Configure WSL (Windows Subsystem for Linux) for Windows CI runners
- Update shell execution to use wsl-bash for Windows targets
- Fix environment variable passing with WSLENV for cross-platform compatibility
- Add WSL system configuration for cluster mode (vm.overcommit_memory, transparent_hugepage)
- Update Valkey installation verification to use absolute paths
- Enable engine installation on Windows by removing OS exclusion
- Update cache key generation to use step outputs instead of env variables
- Add parallel build flag (-j4) to Valkey make command for faster compilation
- Update Java CD workflow to support Windows builds
- Modify integration tests to work with Windows environment
- Update cluster manager and test utilities for cross-platform compatibility

Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com>

* Core: Disable aws-lc-rs CPU jitter entropy to fix TLS connection latency regression (#5223)

The aws-lc-rs library (used by rustls for TLS) introduced CPU jitter entropy
as the default entropy source in v1.14.1. This causes ~3x slower TLS
connection setup (~280ms vs ~90ms).

Since Cargo.lock is gitignored, each CI build resolves the latest aws-lc-rs
version, causing a performance regression starting with packages built after
aws-lc-rs 1.14.1 was released.

The fix adds AWS_LC_SYS_NO_JITTER_ENTROPY=1 to the root .cargo/config.toml,
which is inherited by all client builds and disables jitter entropy at
compile time. This falls back to OS entropy sources (/dev/urandom, getrandom,
RDRAND) which are sufficient for cryptographic purposes.

Reference: aws/aws-lc-rs#899

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com>

* Add missing CHANGELOG for java internal statistics support (#5251)

* Add missing CHANGELOG for java internal statistics support

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* Update CHANGELOG.md

Co-authored-by: James Duong <duong.james@gmail.com>
Signed-off-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com>

---------

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>
Signed-off-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com>
Co-authored-by: James Duong <duong.james@gmail.com>

* Pin usage of CodeQL 2.23.9 to prevent Rust analyzer hanging (#5268)

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* Fix fcall_readonly_function flaky test by removing unreliable wait assertion and adding in retry loop (#5246)

* Fix fcall_readonly_function flaky test by removing unreliable WAIT assertion and adding in retry loop

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* Address feedback, lower poll time and increase retry count

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

---------

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* Reduce upper inflight limit for Python from 1500 to 500 (#5266)

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* Java: Add Windows setup instructions for GLIDE Java development (#5253)

docs(java): Add Windows setup instructions for GLIDE Java development

- Add note about WSL requirement for Windows users at the top of dependencies section
- Add Windows dependencies installation section with two options (winget and Chocolatey)
- Include detailed WSL installation and Valkey setup instructions for Windows users
- Add Windows-specific protoc installation instructions with PowerShell commands
- Clarify platform-specific PATH configuration notes for Linux/MacOS vs Windows
- Improve documentation clarity by adding "For Linux-x86_64:" label to existing protoc instructions
- Update PATH persistence notes to reflect platform differences

Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com>

* Reduce upper inflight limit for Python from 500 to 250 and increase blocking time (#5278)

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* feat(java): add support for KEYS, MIGRATE, and WAITAOF commands (#5107)

* fix(java): enforce immediate timeouts (#5264)


---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>

* Drop support for Node.js 16.x and 18.x. Minimum supported version is now Node.js 20.x. (#5292)

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

* Refresh AWS credentials inside IAM token manager (#5282)

Signed-off-by: Maayan Shani <maayan.shani@mail.huji.ac.il>

* Core: parallelize DNS lookups during slot refresh (#5281)



---------

Signed-off-by: Shoham Elias <shohame@amazon.com>

* Python: Add dynamic PubSub support to sync client (#5272)

Implements dynamic PubSub functionality for the Python sync client, achieving
feature parity with the async client. This allows sync users to dynamically
subscribe/unsubscribe to channels at runtime and monitor subscription health.

Update the Rust FFI layer to report new pubsub statistics.

Unify config.py classes since there aren't differences in support between
sync and async.

Support the pubsub_reconciliation_interval_ms option.

Note that lazy subscription requests are not supported by design for the
sync client.

Signed-off-by: James Duong <duong.james@gmail.com>

* Go: Add ALLOW_NON_COVERED_SLOTS to cluster scan (#5277)

* Go: Support ALLOW_NON_COVERED_SLOTS flag

- Support scanning even if some slots are not covered
- Add test to verify that using an invalid cursor ID throws an error
- Add test to verify that terminating a cursor early does not leak memory

Signed-off-by: James Duong <duong.james@gmail.com>

* Go: Update CHANGELOG and README for cluster scan AllowNonCoveredSlots option

- Add entry to CHANGELOG.md for ALLOW_NON_COVERED_SLOTS flag support
- Add cluster scan documentation section to go/README.md with examples
- Document the new SetAllowNonCoveredSlots() option and its use case

Signed-off-by: James Duong <duong.james@gmail.com>

* Go: Fix linter formatting issues

- Fix field alignment in ClusterScanOptions struct
- Remove extra blank line in test file

Signed-off-by: James Duong <duong.james@gmail.com>

---------

Signed-off-by: James Duong <duong.james@gmail.com>

* Go: Support statistics and dynamic pubsub (#5280)

Implement support for dynamic pubsub commands and
retrieval of statistics, inlcuding pubsub statistics.

Add support for setting the pubsub reconciliation interval.

Closes #5254

Signed-off-by: James Duong <duong.james@gmail.com>

* Java: Add dynamic pubsub APIs and pubsub stats (#5269)

* Add support for dynamic subscription and unsubscription in Java.
* Add methods for retrieving subscription metrics.
* Add the pubsub reconciliation interval advanced option.

Fixes #5267.

Signed-off-by: James Duong <duong.james@gmail.com>

* Update default connectionTImeout for Java test client from 2000ms to 10000ms (#5309)

Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>

---------

Signed-off-by: Sasidharan Gopal <sasidharan.gopal94@gmail.com>
Signed-off-by: Thomas Zhou <thomaszhou64@gmail.com>
Signed-off-by: Sylvain Royer <sylvain.royer@smartnews.com>
Signed-off-by: Alex Rehnby-Martin <alex.rehnby-martin@improving.com>
Signed-off-by: Lior Sventitzky <liorsve@amazon.com>
Signed-off-by: Shoham Elias <shohame@amazon.com>
Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: James Duong <duong.james@gmail.com>
Signed-off-by: affonsov <67347924+affonsov@users.noreply.github.com>
Signed-off-by: Maayan Shani <maayan.shani@mail.huji.ac.il>
Signed-off-by: Avi Fenesh <55848801+avifenesh@users.noreply.github.com>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-25-236.us-east-2.compute.internal>
Signed-off-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com>
Co-authored-by: Sasidharan3094 <sasidharan.gopal94@gmail.com>
Co-authored-by: Thomas Zhou <54688146+xShinnRyuu@users.noreply.github.com>
Co-authored-by: Sylvain Royer <Sylvain-Royer@users.noreply.github.com>
Co-authored-by: Lior Sventitzky <liorsve@amazon.com>
Co-authored-by: Shoham Elias <116083498+shohamazon@users.noreply.github.com>
Co-authored-by: Avi Fenesh <55848801+avifenesh@users.noreply.github.com>
Co-authored-by: James Duong <duong.james@gmail.com>
Co-authored-by: affonsov <67347924+affonsov@users.noreply.github.com>
Co-authored-by: Maayan Shani <161942026+Maayanshani25@users.noreply.github.com>
Co-authored-by: Taylor Curran <taylor.curran@improving.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
shohamazon added a commit that referenced this pull request Feb 8, 2026
shohamazon added a commit to shohamazon/glide-for-redis that referenced this pull request Feb 8, 2026
shohamazon added a commit to shohamazon/glide-for-redis that referenced this pull request Feb 8, 2026
This reverts commit 512feef.

Signed-off-by: Shoham Elias <shohame@amazon.com>
shohamazon added a commit that referenced this pull request Feb 8, 2026
* Revert "fix(java): enforce immediate timeouts (#5264)"

This reverts commit 512feef.

Signed-off-by: Shoham Elias <shohame@amazon.com>

* Revert "perf: Reduce mutex contention and avoid batch clone (#5230)"

This reverts commit f7bf94c.

Signed-off-by: Shoham Elias <shohame@amazon.com>

* changelog

Signed-off-by: Shoham Elias <shohame@amazon.com>

---------

Signed-off-by: Shoham Elias <shohame@amazon.com>
Kaushik-Vijayakumar pushed a commit to Kaushik-Vijayakumar/valkey-glide that referenced this pull request Feb 17, 2026
---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
jduo pushed a commit to jduo/valkey-glide that referenced this pull request Feb 28, 2026
---------

Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Signed-off-by: Shoham Elias <shohame@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Java: Command Timeouts not getting honoured intermittently Java timeout handling should drop timed-out callbacks

4 participants