Skip to content

Introduce SearchResult and SearchResults #3285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

<groupId>org.springframework.data</groupId>
<artifactId>spring-data-commons</artifactId>
<version>4.0.0-SNAPSHOT</version>
<version>4.0.0-SEARCH-RESULT-SNAPSHOT</version>

<name>Spring Data Core</name>
<description>Core Spring concepts underpinning every Spring Data module.</description>
Expand Down
1 change: 1 addition & 0 deletions src/main/antora/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
** xref:repositories/query-methods.adoc[]
** xref:repositories/definition.adoc[]
** xref:repositories/query-methods-details.adoc[]
** xref:repositories/vector-search.adoc[]
** xref:repositories/create-instances.adoc[]
** xref:repositories/custom-implementations.adoc[]
** xref:repositories/core-domain-events.adoc[]
Expand Down
167 changes: 167 additions & 0 deletions src/main/antora/modules/ROOT/pages/repositories/vector-search.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
[[vector-search]]
= Vector Search

With the rise of Generative AI, Vector databases have gained strong traction in the world of databases.
These databases enable efficient storage and querying of high-dimensional vectors, making them well-suited for tasks such as semantic search, recommendation systems, and natural language understanding.

Vector search is a technique that retrieves semantically similar data by comparing vector representations (also known as embeddings) rather than relying on traditional exact-match queries.
This approach enables intelligent, context-aware applications that go beyond keyword-based retrieval.

In the context of Spring Data, vector search opens new possibilities for building intelligent, context-aware applications, particularly in domains like natural language processing, recommendation systems, and generative AI.
By modelling vector-based querying using familiar repository abstractions, Spring Data allows developers to seamlessly integrate similarity-based vector-capable databases with the simplicity and consistency of the Spring Data programming model.

ifdef::vector-search-intro-include[]
include::{vector-search-intro-include}[]
endif::[]

[[vector-search.model]]
== Vector Model

To support vector search in a type-safe and idiomatic way, Spring Data introduces the following core abstractions:

* <<vector-search.model.vector,`Vector`>>
* <<vector-search.model.search-result,`SearchResults<T>` and `SearchResult<T>`>>
* <<vector-search.model.scoring,`Score`, `Similarity` and Scoring Functions>>

[[vector-search.model.vector]]
=== `Vector`

The `Vector` type represents an n-dimensional numerical embedding, typically produced by embedding models.
In Spring Data, it is defined as a lightweight wrapper around an array of floating-point numbers, ensuring immutability and consistency.
This type can be used as an input for search queries or as a property on a domain entity to store the associated vector representation.

====
[source,java]
----
Vector vector = Vector.of(0.23f, 0.11f, 0.77f);
----
====

Using `Vector` in your domain model removes the need to work with raw arrays or lists of numbers, providing a more type-safe and expressive way to handle vector data.
This abstraction also allows for easy integration with various vector databases and libraries.
It also allows for implementing vendor-specific optimizations such as binary or quantized vectors that do not map to a standard floating point (`float` and `double` as of https://en.wikipedia.org/wiki/IEEE_754[IEEE 754]) representation.
A domain object can have a vector property, which can be used for similarity searches.
Consider the following example:

ifdef::vector-search-model-include[]
include::{vector-search-model-include}[]
endif::[]

NOTE: Associating a vector with a domain object results in the vector being loaded and stored as part of the entity lifecycle, which may introduce additional overhead on retrieval and persistence operations.

[[vector-search.model.search-result]]
=== Search Results

The `SearchResult<T>` type encapsulates the results of a vector similarity query.
It includes both the matched domain object and a relevance score that indicates how closely it matches the query vector.
This abstraction provides a structured way to handle result ranking and enables developers to easily work with both the data and its contextual relevance.

ifdef::vector-search-repository-include[]
include::{vector-search-repository-include}[]
endif::[]

In this example, the `searchByCountryAndEmbeddingNear` method returns a `SearchResults<Comment>` object, which contains a list of `SearchResult<Comment>` instances.
Each result includes the matched `Comment` entity and its relevance score.

Relevance score is a numerical value that indicates how closely the matched vector aligns with the query vector.
Depending on whether a score represents distance or similarity a higher score can mean a closer match or a more distant one.

The scoring function used to calculate this score can vary based on the underlying database, index or input parameters.

[[vector-search.model.scoring]]
=== Score, Similarity, and Scoring Functions

The `Score` type holds a numerical value indicating the relevance of a search result.
It can be used to rank results based on their similarity to the query vector.
The `Score` type is typically a floating-point number, and its interpretation (higher is better or lower is better) depends on the specific similarity function used.
Scores are a by-product of vector search and are not required for a successful search operation.
Score values are not part of a domain model and therefore represented best as out-of-band data.

Generally, a Score is computed by a `ScoringFunction`.
The actual scoring function used to calculate this score can depends on the underlying database and can be obtained from a search index or input parameters.

Spring Data support declares constants for commonly used functions such as:

Euclidean Distance:: Calculates the straight-line distance in n-dimensional space involving the square root of the sum of squared differences.
Cosine Similarity:: Measures the angle between two vectors by calculating the Dot product first and then normalizing its result by dividing by the product of their lengths.
Dot Product:: Computes the sum of element-wise multiplications.

The choice of similarity function can impact both the performance and semantics of the search and is often determined by the underlying database or index being used.
Spring Data adopts to the database's native scoring function capabilities and whether the score can be used to limit results.

ifdef::vector-search-scoring-include[]
include::{vector-search-scoring-include}[]
endif::[]

[[vector-search.methods]]
== Vector Search Methods

Vector search methods are defined in repositories using the same conventions as standard Spring Data query methods.
These methods return `SearchResults<T>` and require a `Vector` parameter to define the query vector.
The actual implementation depends on the actual internals of the underlying data store and its capabilities around vector search.

NOTE: If you are new to Spring Data repositories, make sure to familiarize yourself with the xref:repositories/core-concepts.adoc[basics of repository definitions and query methods].

Generally, you have the choice of declaring a search method using two approaches:

* Query Derivation
* Declaring a String-based Query

Vector Search methods must declare a `Vector` parameter to define the query vector.

[[vector-search.method.derivation]]
=== Derived Search Methods

A derived search method uses the name of the method to derive the query.
Vector Search supports the following keywords to run a Vector search when declaring a search method:

.Query predicate keywords
[options="header",cols="1,3"]
|===============
|Logical keyword|Keyword expressions
|`NEAR`|`Near`, `IsNear`
|`WITHIN`|`Within`, `IsWithin`
|===============

ifdef::vector-search-method-derived-include[]
include::{vector-search-method-derived-include}[]
endif::[]

Derived search methods are typically easier to read and maintain, as they rely on the method name to express the query intent.
However, a derived search method requires either to declare a `Score`, `Range<Score>` or `ScoreFunction` as second argument to the `Near`/`Within` keyword to limit search results by their score.

[[vector-search.method.string]]
=== Annotated Search Methods

Annotated methods provide full control over the query semantics and parameters.
Unlike derived methods, they do not rely on method name conventions.

ifdef::vector-search-method-annotated-include[]
include::{vector-search-method-annotated-include}[]
endif::[]

With more control over the actual query, Spring Data can make fewer assumptions about the query and its parameters.
For example, `Similarity` normalization uses the native score function within the query to normalize the given similarity into a score predicate value and vice versa.
If an annotated query does not define e.g. the score, then the score value in the returned `SearchResult<T>` will be zero.

[[vector-search.method.sorting]]
=== Sorting

By default, search results are ordered according to their score.
You can override sorting by using the `Sort` parameter:

.Using `Sort` in Repository Search Methods
====
[source,java]
----
interface CommentRepository extends Repository<Comment, String> {

SearchResults<Comment> searchByEmbeddingNearOrderByCountry(Vector vector, Score score);

SearchResults<Comment> searchByEmbeddingWithin(Vector vector, Score score, Sort sort);
}
----
====

Please note that custom sorting does not allow expressing the score as a sorting criteria.
You can only refer to domain properties.
1 change: 1 addition & 0 deletions src/main/java/org/springframework/data/domain/Page.java
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,5 @@ static <T> Page<T> empty(Pageable pageable) {
*/
@Override
<U> Page<U> map(Function<? super T, ? extends U> converter);

}
4 changes: 2 additions & 2 deletions src/main/java/org/springframework/data/domain/Range.java
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ public boolean contains(T value, Comparator<T> comparator) {
/**
* Apply a mapping {@link Function} to the lower and upper boundary values.
*
* @param mapper must not be {@literal null}. If the mapper returns {@code null}, then the corresponding boundary
* @param mapper must not be {@literal null}. If the mapper returns {@literal null}, then the corresponding boundary
* value represents an {@link Bound#unbounded()} boundary.
* @return a new {@link Range} after applying the value to the mapper.
* @param <R> target type of the mapping function.
Expand Down Expand Up @@ -430,7 +430,7 @@ public boolean isInclusive() {
/**
* Apply a mapping {@link Function} to the boundary value.
*
* @param mapper must not be {@literal null}. If the mapper returns {@code null}, then the boundary value
* @param mapper must not be {@literal null}. If the mapper returns {@literal null}, then the boundary value
* corresponds with {@link Bound#unbounded()}.
* @return a new {@link Bound} after applying the value to the mapper.
* @param <R>
Expand Down
118 changes: 118 additions & 0 deletions src/main/java/org/springframework/data/domain/Score.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
/*
* Copyright 2025 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.springframework.data.domain;

import java.io.Serializable;

import org.springframework.util.ObjectUtils;

/**
* Value object representing a search result score computed via a {@link ScoringFunction}.
* <p>
* Encapsulates the numeric score and the scoring function used to derive it. Scores are primarily used to rank search
* results. Depending on the used {@link ScoringFunction} higher scores can indicate either a higher distance or a
* higher similarity. Use the {@link Similarity} class to indicate usage of a normalized score across representing
* effectively the similarity.
* <p>
* Instances of this class are immutable and suitable for use in comparison, sorting, and range operations.
*
* @author Mark Paluch
* @since 4.0
* @see Similarity
*/
public sealed class Score implements Serializable permits Similarity {

private final double value;
private final ScoringFunction function;

Score(double value, ScoringFunction function) {
this.value = value;
this.function = function;
}

/**
* Creates a new {@link Score} from a plain {@code score} value using {@link ScoringFunction#unspecified()}.
*
* @param score the score value without a specific {@link ScoringFunction}.
* @return the new {@link Score}.
*/
public static Score of(double score) {
return of(score, ScoringFunction.unspecified());
}

/**
* Creates a new {@link Score} from a {@code score} value using the given {@link ScoringFunction}.
*
* @param score the score value.
* @param function the scoring function that has computed the {@code score}.
* @return the new {@link Score}.
*/
public static Score of(double score, ScoringFunction function) {
return new Score(score, function);
}

/**
* Creates a {@link Range} from the given minimum and maximum {@code Score} values.
*
* @param min the lower score value, must not be {@literal null}.
* @param max the upper score value, must not be {@literal null}.
* @return a {@link Range} over {@link Score} bounds.
*/
public static Range<Score> between(Score min, Score max) {
return Range.from(Range.Bound.inclusive(min)).to(Range.Bound.inclusive(max));
}

/**
* Returns the raw numeric value of the score.
*
* @return the score value.
*/
public double getValue() {
return value;
}

/**
* Returns the {@link ScoringFunction} that was used to compute this score.
*
* @return the associated scoring function.
*/
public ScoringFunction getFunction() {
return function;
}

@Override
public boolean equals(Object o) {
if (!(o instanceof Score other)) {
return false;
}
if (value != other.value) {
return false;
}
return ObjectUtils.nullSafeEquals(function, other.function);
}

@Override
public int hashCode() {
return ObjectUtils.nullSafeHash(value, function);
}

@Override
public String toString() {
return function instanceof UnspecifiedScoringFunction ? Double.toString(value)
: "%s (%s)".formatted(Double.toString(value), function.getName());
}

}
Loading