Skip to content

Commit 19bcf33

Browse files
committed
feat(redis): Add Redis-based semantic caching and chat memory implementations
Add comprehensive Redis-backed features to enhance Spring AI: * Add semantic caching for chat responses: - SemanticCache interface and Redis implementation using vector similarity - SemanticCacheAdvisor for intercepting and caching chat responses - Uses vector search to cache and retrieve responses based on query similarity - Support for TTL-based cache expiration - Improves response times and reduces API costs for similar questions * Add Redis-based chat memory implementation: - RedisChatMemory using RedisJSON + RediSearch for conversation storage - Configurable RedisChatMemoryConfig with builder pattern support - Message TTL, ordering, multi-conversation and batch operations - Efficient conversation history retrieval using RediSearch indexes * Add integration tests: - Comprehensive test coverage using TestContainers - Tests for semantic caching features and chat memory operations - Integration test for RedisVectorStore with VectorStoreChatMemoryAdvisor - Verify chat completion augmentation with vector store content The Redis implementations enable efficient storage and retrieval of chat responses and conversation history, with semantic search capabilities and configurable persistence options. Signed-off-by: Brian Sam-Bodden [email protected]
1 parent ce8e8b9 commit 19bcf33

File tree

10 files changed

+1677
-0
lines changed

10 files changed

+1677
-0
lines changed

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/redis.adoc

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ link:https://redis.io/docs/interact/search-and-query/[Redis Search and Query] ex
99
* Store vectors and the associated metadata within hashes or JSON documents
1010
* Retrieve vectors
1111
* Perform vector searches
12+
* Cache chat responses based on semantic similarity
13+
* Store and query conversation history
1214
1315
== Prerequisites
1416

@@ -152,6 +154,69 @@ is converted into the proprietary Redis filter format:
152154
@country:{UK | NL} @year:[2020 inf]
153155
----
154156

157+
=== Semantic Cache Usage
158+
159+
The semantic cache provides vector similarity-based caching for chat responses implemented as an advisor:
160+
161+
[source,java]
162+
----
163+
// Create semantic cache
164+
SemanticCache semanticCache = DefaultSemanticCache.builder()
165+
.embeddingModel(embeddingModel)
166+
.jedisClient(jedisClient)
167+
.similarityThreshold(0.95) // Optional: defaults to 0.95
168+
.build();
169+
170+
// Create cache advisor
171+
SemanticCacheAdvisor cacheAdvisor = SemanticCacheAdvisor.builder()
172+
.cache(semanticCache)
173+
.build();
174+
175+
// Use with chat client
176+
ChatResponse response = ChatClient.builder(chatModel)
177+
.build()
178+
.prompt("What is the capital of France?")
179+
.advisors(cacheAdvisor)
180+
.call()
181+
.chatResponse();
182+
183+
// Manually interact with cache
184+
semanticCache.set("query", chatResponse);
185+
semanticCache.set("query", chatResponse, Duration.ofHours(1)); // With TTL
186+
Optional<ChatResponse> cached = semanticCache.get("similar query");
187+
----
188+
189+
=== Chat Memory Usage
190+
191+
RedisChatMemory provides persistent storage for conversation history:
192+
193+
[source,java]
194+
----
195+
// Create chat memory
196+
RedisChatMemory chatMemory = RedisChatMemory.builder()
197+
.jedisClient(jedisClient)
198+
.timeToLive(Duration.ofHours(24)) // Optional: message TTL
199+
.indexName("custom-memory-index") // Optional
200+
.keyPrefix("custom-prefix") // Optional
201+
.build();
202+
203+
// Add messages
204+
chatMemory.add("conversation-1", new UserMessage("Hello"));
205+
chatMemory.add("conversation-1", new AssistantMessage("Hi there!"));
206+
207+
// Add multiple messages
208+
chatMemory.add("conversation-1", List.of(
209+
new UserMessage("How are you?"),
210+
new AssistantMessage("I'm doing well!")
211+
));
212+
213+
// Retrieve messages
214+
List<Message> messages = chatMemory.get("conversation-1", 10); // Last 10 messages
215+
216+
// Clear conversation
217+
chatMemory.clear("conversation-1");
218+
----
219+
155220
== Manual Configuration
156221

157222
Instead of using the Spring Boot auto-configuration, you can manually configure the Redis vector store. For this you need to add the `spring-ai-redis-store` to your project:

vector-stores/spring-ai-redis-store/pom.xml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,13 @@
101101
<scope>test</scope>
102102
</dependency>
103103

104+
<dependency>
105+
<groupId>org.springframework.ai</groupId>
106+
<artifactId>spring-ai-openai</artifactId>
107+
<version>${project.parent.version}</version>
108+
<scope>test</scope>
109+
</dependency>
110+
104111
</dependencies>
105112

106113
</project>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
/*
2+
* Copyright 2023-2025 the original author or authors.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* https://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package org.springframework.ai.chat.cache.semantic;
17+
18+
import org.springframework.ai.chat.client.advisor.api.*;
19+
import org.springframework.ai.chat.model.ChatResponse;
20+
import org.springframework.ai.vectorstore.redis.cache.semantic.SemanticCache;
21+
import reactor.core.publisher.Flux;
22+
23+
import java.util.Optional;
24+
25+
/**
26+
* An advisor implementation that provides semantic caching capabilities for chat
27+
* responses. This advisor intercepts chat requests and checks for semantically similar
28+
* cached responses before allowing the request to proceed to the model.
29+
*
30+
* <p>
31+
* This advisor implements both {@link CallAroundAdvisor} for synchronous operations and
32+
* {@link StreamAroundAdvisor} for reactive streaming operations.
33+
* </p>
34+
*
35+
* <p>
36+
* Key features:
37+
* <ul>
38+
* <li>Semantic similarity based caching of responses</li>
39+
* <li>Support for both synchronous and streaming chat operations</li>
40+
* <li>Configurable execution order in the advisor chain</li>
41+
* </ul>
42+
*
43+
* @author Brian Sam-Bodden
44+
*/
45+
public class SemanticCacheAdvisor implements CallAroundAdvisor, StreamAroundAdvisor {
46+
47+
/** The underlying semantic cache implementation */
48+
private final SemanticCache cache;
49+
50+
/** The order of this advisor in the chain */
51+
private final int order;
52+
53+
/**
54+
* Creates a new semantic cache advisor with default order.
55+
* @param cache The semantic cache implementation to use
56+
*/
57+
public SemanticCacheAdvisor(SemanticCache cache) {
58+
this(cache, Advisor.DEFAULT_CHAT_MEMORY_PRECEDENCE_ORDER);
59+
}
60+
61+
/**
62+
* Creates a new semantic cache advisor with specified order.
63+
* @param cache The semantic cache implementation to use
64+
* @param order The order of this advisor in the chain
65+
*/
66+
public SemanticCacheAdvisor(SemanticCache cache, int order) {
67+
this.cache = cache;
68+
this.order = order;
69+
}
70+
71+
@Override
72+
public String getName() {
73+
return this.getClass().getSimpleName();
74+
}
75+
76+
@Override
77+
public int getOrder() {
78+
return this.order;
79+
}
80+
81+
/**
82+
* Handles synchronous chat requests by checking the cache before proceeding. If a
83+
* semantically similar response is found in the cache, it is returned immediately.
84+
* Otherwise, the request proceeds through the chain and the response is cached.
85+
* @param request The chat request to process
86+
* @param chain The advisor chain to continue processing if needed
87+
* @return The response, either from cache or from the model
88+
*/
89+
@Override
90+
public AdvisedResponse aroundCall(AdvisedRequest request, CallAroundAdvisorChain chain) {
91+
// Check cache first
92+
Optional<ChatResponse> cached = cache.get(request.userText());
93+
94+
if (cached.isPresent()) {
95+
return new AdvisedResponse(cached.get(), request.adviseContext());
96+
}
97+
98+
// Cache miss - call the model
99+
AdvisedResponse response = chain.nextAroundCall(request);
100+
101+
// Cache the response
102+
if (response.response() != null) {
103+
cache.set(request.userText(), response.response());
104+
}
105+
106+
return response;
107+
}
108+
109+
/**
110+
* Handles streaming chat requests by checking the cache before proceeding. If a
111+
* semantically similar response is found in the cache, it is returned as a single
112+
* item flux. Otherwise, the request proceeds through the chain and the final response
113+
* is cached.
114+
* @param request The chat request to process
115+
* @param chain The advisor chain to continue processing if needed
116+
* @return A Flux of responses, either from cache or from the model
117+
*/
118+
@Override
119+
public Flux<AdvisedResponse> aroundStream(AdvisedRequest request, StreamAroundAdvisorChain chain) {
120+
// Check cache first
121+
Optional<ChatResponse> cached = cache.get(request.userText());
122+
123+
if (cached.isPresent()) {
124+
return Flux.just(new AdvisedResponse(cached.get(), request.adviseContext()));
125+
}
126+
127+
// Cache miss - stream from model
128+
return chain.nextAroundStream(request).collectList().flatMapMany(responses -> {
129+
// Cache the final aggregated response
130+
if (!responses.isEmpty()) {
131+
AdvisedResponse last = responses.get(responses.size() - 1);
132+
if (last.response() != null) {
133+
cache.set(request.userText(), last.response());
134+
}
135+
}
136+
return Flux.fromIterable(responses);
137+
});
138+
}
139+
140+
/**
141+
* Creates a new builder for constructing SemanticCacheAdvisor instances.
142+
* @return A new builder instance
143+
*/
144+
public static Builder builder() {
145+
return new Builder();
146+
}
147+
148+
/**
149+
* Builder class for creating SemanticCacheAdvisor instances. Provides a fluent API
150+
* for configuration.
151+
*/
152+
public static class Builder {
153+
154+
private SemanticCache cache;
155+
156+
private int order = Advisor.DEFAULT_CHAT_MEMORY_PRECEDENCE_ORDER;
157+
158+
/**
159+
* Sets the semantic cache implementation.
160+
* @param cache The cache implementation to use
161+
* @return This builder instance
162+
*/
163+
public Builder cache(SemanticCache cache) {
164+
this.cache = cache;
165+
return this;
166+
}
167+
168+
/**
169+
* Sets the advisor order.
170+
* @param order The order value for this advisor
171+
* @return This builder instance
172+
*/
173+
public Builder order(int order) {
174+
this.order = order;
175+
return this;
176+
}
177+
178+
/**
179+
* Builds and returns a new SemanticCacheAdvisor instance.
180+
* @return A new SemanticCacheAdvisor configured with this builder's settings
181+
*/
182+
public SemanticCacheAdvisor build() {
183+
return new SemanticCacheAdvisor(cache, order);
184+
}
185+
186+
}
187+
188+
}

0 commit comments

Comments
 (0)