Skip to content

Update embedders settings, hybrid search, and add tests for AI search methods #1087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
May 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
c125d64
Update embedders
Strift Mar 8, 2025
7f61d2c
Update embedders models
Strift Mar 8, 2025
93e8f69
Add docs
Strift Mar 8, 2025
d3aa65b
Allow updating embedders via update_settings
Strift Mar 8, 2025
2fcbc47
Refactor config validation to avoid duplicate code
Strift Mar 8, 2025
f9258e9
Update validation code
Strift Mar 8, 2025
8a4369d
Remove validation to let meilisearch handle it
Strift Mar 8, 2025
742ef5e
Remove unused parameters
Strift Mar 8, 2025
c4e26d7
Add hybrid search
Strift Mar 9, 2025
5e954ac
Add test for retrieving vectors
Strift Mar 9, 2025
05291f4
Add semanticHitCount test
Strift Mar 9, 2025
b064b0b
Update comment
Strift Mar 9, 2025
d5d928e
Add test for similar documents
Strift Mar 9, 2025
b49cb42
Fix linters errors
Strift Mar 9, 2025
ef1b771
Sort imports
Strift Mar 9, 2025
297b3e4
Update meilisearch/models/embedders.py
Strift Mar 20, 2025
c7c1700
Avoid repeating embedder type
Strift Mar 20, 2025
b1258c7
Remove docs
Strift Mar 20, 2025
8960bc2
Add unintentionally removed
Strift Mar 20, 2025
268aa4c
Fix mypy issues
Strift Mar 20, 2025
d8825aa
Add test for embedders fields
Strift Mar 26, 2025
b324323
Add tests for fields presence
Strift Mar 26, 2025
057377b
Split tests
Strift Mar 26, 2025
8082344
Merge branch 'main' into feat/add-embedders-settings
Strift Apr 2, 2025
e515f29
Fix missing imports
Strift Apr 2, 2025
dc98a1e
Merge branch 'main' into feat/add-embedders-settings
Strift Apr 8, 2025
44a68a5
Merge branch 'main' into feat/add-embedders-settings
brunoocasali May 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,27 @@ JSON output:
}
```

#### Hybrid Search <!-- omit in toc -->

Hybrid search combines traditional keyword search with semantic search for more relevant results. You need to have an embedder configured in your index settings to use this feature.

```python
# Using hybrid search with the search method
index.search(
'action movie',
{
"hybrid": {"semanticRatio": 0.5, "embedder": "default"}
}
)
```

The `semanticRatio` parameter (between 0 and 1) controls the balance between keyword search and semantic search:
- 0: Only keyword search
- 1: Only semantic search
- Values in between: A mix of both approaches

The `embedder` parameter specifies which configured embedder to use for the semantic search component.

#### Custom Search With Filters <!-- omit in toc -->

If you want to enable filtering, you must add your attributes to the `filterableAttributes` index setting.
Expand Down
122 changes: 84 additions & 38 deletions meilisearch/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,19 +24,22 @@
from meilisearch.config import Config
from meilisearch.errors import version_error_hint_message
from meilisearch.models.document import Document, DocumentsResults
from meilisearch.models.index import (
from meilisearch.models.embedders import (
Embedders,
Faceting,
EmbedderType,
HuggingFaceEmbedder,
IndexStats,
LocalizedAttributes,
OllamaEmbedder,
OpenAiEmbedder,
RestEmbedder,
UserProvidedEmbedder,
)
from meilisearch.models.index import (
Faceting,
IndexStats,
LocalizedAttributes,
Pagination,
ProximityPrecision,
RestEmbedder,
TypoTolerance,
UserProvidedEmbedder,
)
from meilisearch.models.task import Task, TaskInfo, TaskResults
from meilisearch.task import TaskHandler
Expand Down Expand Up @@ -277,14 +280,21 @@ def get_stats(self) -> IndexStats:
def search(self, query: str, opt_params: Optional[Mapping[str, Any]] = None) -> Dict[str, Any]:
"""Search in the index.

https://www.meilisearch.com/docs/reference/api/search

Parameters
----------
query:
String containing the searched word(s)
opt_params (optional):
Dictionary containing optional query parameters.
Note: The vector parameter is only available in Meilisearch >= v1.13.0
https://www.meilisearch.com/docs/reference/api/search#search-in-an-index
Common parameters include:
- hybrid: Dict with 'semanticRatio' and 'embedder' fields for hybrid search
- vector: Array of numbers for vector search
- retrieveVectors: Boolean to include vector data in search results
- filter: Filter queries by an attribute's value
- limit: Maximum number of documents returned
- offset: Number of documents to skip

Returns
-------
Expand All @@ -298,7 +308,9 @@ def search(self, query: str, opt_params: Optional[Mapping[str, Any]] = None) ->
"""
if opt_params is None:
opt_params = {}

body = {"q": query, **opt_params}

return self.http.post(
f"{self.config.paths.index}/{self.uid}/{self.config.paths.search}",
body=body,
Expand Down Expand Up @@ -955,14 +967,7 @@ def get_settings(self) -> Dict[str, Any]:
)

if settings.get("embedders"):
embedders: dict[
str,
OpenAiEmbedder
| HuggingFaceEmbedder
| OllamaEmbedder
| RestEmbedder
| UserProvidedEmbedder,
] = {}
embedders: dict[str, EmbedderType] = {}
for k, v in settings["embedders"].items():
if v.get("source") == "openAi":
embedders[k] = OpenAiEmbedder(**v)
Expand All @@ -988,6 +993,26 @@ def update_settings(self, body: MutableMapping[str, Any]) -> TaskInfo:
----------
body:
Dictionary containing the settings of the index.
Supported settings include:
- 'rankingRules': List of ranking rules
- 'distinctAttribute': Attribute for deduplication
- 'searchableAttributes': Attributes that can be searched
- 'displayedAttributes': Attributes to display in search results
- 'stopWords': Words ignored in search queries
- 'synonyms': Dictionary of synonyms
- 'filterableAttributes': Attributes that can be used for filtering
- 'sortableAttributes': Attributes that can be used for sorting
- 'typoTolerance': Settings for typo tolerance
- 'pagination': Settings for pagination
- 'faceting': Settings for faceting
- 'dictionary': List of custom dictionary words
- 'separatorTokens': List of separator tokens
- 'nonSeparatorTokens': List of non-separator tokens
- 'embedders': Dictionary of embedder configurations for AI-powered search
- 'searchCutoffMs': Maximum search time in milliseconds
- 'proximityPrecision': Precision for proximity ranking
- 'localizedAttributes': Settings for localized attributes

More information:
https://www.meilisearch.com/docs/reference/api/settings#update-settings

Expand All @@ -1000,7 +1025,8 @@ def update_settings(self, body: MutableMapping[str, Any]) -> TaskInfo:
Raises
------
MeilisearchApiError
An error containing details about why Meilisearch can't process your request. Meilisearch error codes are described here: https://www.meilisearch.com/docs/reference/errors/error_codes#meilisearch-errors
An error containing details about why Meilisearch can't process your request.
Meilisearch error codes are described here: https://www.meilisearch.com/docs/reference/errors/error_codes#meilisearch-errors
"""
if body.get("embedders"):
for _, v in body["embedders"].items():
Expand Down Expand Up @@ -1879,10 +1905,13 @@ def reset_non_separator_tokens(self) -> TaskInfo:
def get_embedders(self) -> Embedders | None:
"""Get embedders of the index.

Retrieves the current embedder configuration from Meilisearch.

Returns
-------
settings:
The embedders settings of the index.
Embedders:
The embedders settings of the index, or None if no embedders are configured.
Contains a dictionary of embedder configurations, where keys are embedder names.

Raises
------
Expand All @@ -1894,35 +1923,35 @@ def get_embedders(self) -> Embedders | None:
if not response:
return None

embedders: dict[
str,
OpenAiEmbedder
| HuggingFaceEmbedder
| OllamaEmbedder
| RestEmbedder
| UserProvidedEmbedder,
] = {}
embedders: dict[str, EmbedderType] = {}
for k, v in response.items():
if v.get("source") == "openAi":
source = v.get("source")
if source == "openAi":
embedders[k] = OpenAiEmbedder(**v)
elif v.get("source") == "ollama":
embedders[k] = OllamaEmbedder(**v)
elif v.get("source") == "huggingFace":
elif source == "huggingFace":
embedders[k] = HuggingFaceEmbedder(**v)
elif v.get("source") == "rest":
elif source == "ollama":
embedders[k] = OllamaEmbedder(**v)
elif source == "rest":
embedders[k] = RestEmbedder(**v)
elif source == "userProvided":
embedders[k] = UserProvidedEmbedder(**v)
else:
# Default to UserProvidedEmbedder for unknown sources
embedders[k] = UserProvidedEmbedder(**v)

return Embedders(embedders=embedders)

def update_embedders(self, body: Union[MutableMapping[str, Any], None]) -> TaskInfo:
"""Update embedders of the index.

Updates the embedder configuration for the index. The embedder configuration
determines how Meilisearch generates vector embeddings for documents.

Parameters
----------
body: dict
Dictionary containing the embedders.
Dictionary containing the embedders configuration.

Returns
-------
Expand All @@ -1933,13 +1962,28 @@ def update_embedders(self, body: Union[MutableMapping[str, Any], None]) -> TaskI
Raises
------
MeilisearchApiError
An error containing details about why Meilisearch can't process your request. Meilisearch error codes are described here: https://www.meilisearch.com/docs/reference/errors/error_codes#meilisearch-errors
An error containing details about why Meilisearch can't process your request.
Meilisearch error codes are described here: https://www.meilisearch.com/docs/reference/errors/error_codes#meilisearch-errors
"""
if body is not None and body.get("embedders"):
embedders: dict[str, EmbedderType] = {}
for k, v in body["embedders"].items():
source = v.get("source")
if source == "openAi":
embedders[k] = OpenAiEmbedder(**v)
elif source == "huggingFace":
embedders[k] = HuggingFaceEmbedder(**v)
elif source == "ollama":
embedders[k] = OllamaEmbedder(**v)
elif source == "rest":
embedders[k] = RestEmbedder(**v)
elif source == "userProvided":
embedders[k] = UserProvidedEmbedder(**v)
else:
# Default to UserProvidedEmbedder for unknown sources
embedders[k] = UserProvidedEmbedder(**v)

if body:
for _, v in body.items():
if "documentTemplateMaxBytes" in v and v["documentTemplateMaxBytes"] is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this handling done by Meili now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing it did not trigger any test failure but it might simply be untested, so I added it back to avoid any unwanted side effects

del v["documentTemplateMaxBytes"]
body = {"embedders": {k: v.model_dump(by_alias=True) for k, v in embedders.items()}}

task = self.http.patch(self.__settings_url_for(self.config.paths.embedders), body)

Expand All @@ -1948,6 +1992,8 @@ def update_embedders(self, body: Union[MutableMapping[str, Any], None]) -> TaskI
def reset_embedders(self) -> TaskInfo:
"""Reset embedders of the index to default values.

Removes all embedder configurations from the index.

Returns
-------
task_info:
Expand Down
Loading