Qdrant VSQL (VectorSQL) is an ambitious Python library aiming to bring a SQL-like query interface to Qdrant vector databases. Currently, it provides a robust solution for converting SQL-like WHERE
clauses into Qdrant Filter
objects, simplifying complex filtering logic.
Maybe Qdrant VSQL will evolve into a comprehensive SQL interface for Qdrant, potentially becoming a standard for vector database interactions. Imagine writing intuitive SQL queries for all your Qdrant operations, from filtering to vector similarity search, aggregation, and more!
- SQL-like Syntax: Write familiar
WHERE
clauses (e.g.,age > 30 AND city = 'New York'
). - Comprehensive Operator Support: Includes
=
,!=
,<>
,>
,>=
,<
,<=
,IN
,NOT IN
,BETWEEN
,NOT BETWEEN
,LIKE
,IS NULL
,IS NOT NULL
,IS EMPTY
,COUNT()
. - Logical Operators: Supports
AND
,OR
, andNOT
for complex query combinations. Supports intricate logical structures with nested operations. See more examples in the test suite. - Nested Field Support: Easily filter on nested payload fields using dot notation (e.g.,
user.address.city
). - Array Projection: Handle array fields with
[]
notation (e.g.,tags[] = 'urgent'
). - Type Handling: Automatically converts string, number, and boolean values to appropriate Qdrant types.
- Qdrant Native Output: Generates
qdrant_client.http.models.Filter
objects, ready for use with the Qdrant client. - ID Filtering: Automatically translates all
id
field operations (e.g.,id = 123
) into Qdrant's nativeHasIdCondition
for efficient point ID filtering. - LIKE Operator: When using the
LIKE
operator, it automatically translates to Qdrant'sMatchText
condition for full-text search capabilities. Otherwise it usesMatchValue
for strings. - Case-Insensitive: Operators and keywords are case-insensitive for flexible query writing.
You can install Qdrant VSQL directly from PyPI:
pip install qdrant-vsql
Here's how you can use Qdrant VSQL to build your Qdrant filter queries:
from qdrant_client import QdrantClient
from qdrant_vsql.filtering import where2filter
# Initialize Qdrant client (replace with your actual client setup)
qdrant_client = QdrantClient(":memory:")
# Example 1: Simple AND condition
query_str_1 = "city = 'London' AND age > 30"
qdrant_filter_1 = where2filter(query_str_1)
# Example 2: OR and IN conditions
query_str_2 = "status IN ('active', 'pending') OR priority = 'high'"
qdrant_filter_2 = where2filter(query_str_2)
# Example 3: NOT and BETWEEN conditions
query_str_3 = "NOT (price BETWEEN 100 AND 200) AND category IS NOT NULL"
qdrant_filter_3 = where2filter(query_str_3)
# Example 4: COUNT and LIKE conditions
query_str_4 = "COUNT(tags) >= 2 AND description LIKE 'product'"
qdrant_filter_4 = where2filter(query_str_4)
# Example 5: Datetime range condition
query_str_5 = "event_date BETWEEN '2023-01-01T00:00:00Z' AND '2023-12-31T23:59:59Z'"
qdrant_filter_5 = where2filter(query_str_5)
# Call Qdrant with your filter
hits = qdrant_client.query_points(
collection_name="my_collection",
query=np.random.rand(100),
query_filter=qdrant_filter_1,
limit=10
)
This section details the equivalences between common SQL-like WHERE
clause syntax and the corresponding Qdrant Filter
object structures. The where2filter
function translates these SQL expressions into Qdrant's native filtering language, enabling powerful and flexible queries.
A Qdrant Filter
object typically consists of must
, should
, and must_not
arrays, which represent AND
, OR
, and NOT
logical operations, respectively. Conditions within these arrays are combined to form complex queries.
SQL-like Syntax | Qdrant Filter Object Equivalent (simplified) | Notes |
---|---|---|
field = 'value' |
{"key": "field", "match": {"value": "value"}} |
Exact match for strings, numbers, booleans. |
field != 'value' or field <> 'value' |
must_not: [{"key": "field", "match": {"value": "value"}}] |
Negation of exact match. |
field IN ('val1', 'val2') |
{"key": "field", "match": {"any": ["val1", "val2"]}} |
Matches if field is any of the values. |
field NOT IN ('val1', 'val2') |
must_not: [{"key": "field", "match": {"any": ["val1", "val2"]}}] |
Matches if field is none of the values. |
field BETWEEN val1 AND val2 |
{"key": "field", "range": {"gte": val1, "lte": val2}} |
Inclusive range for numbers. |
field NOT BETWEEN val1 AND val2 |
must_not: [{"key": "field", "range": {"gte": val1, "lte": val2}}] |
Negation of inclusive range. |
field > value |
{"key": "field", "range": {"gt": value}} |
Greater than. |
field >= value |
{"key": "field", "range": {"gte": value}} |
Greater than or equal. |
field < value |
{"key": "field", "range": {"lt": value}} |
Less than. |
field <= value |
{"key": "field", "range": {"lte": value}} |
Less than or equal. |
field LIKE 'pattern' |
{"key": "field", "match": {"text": "pattern%"}} |
Full-text search. |
field IS NULL |
{"is_null": {"key": "field"}} |
Matches if field is null or does not exist. |
field IS NOT NULL |
must_not: [{"is_null": {"key": "field"}}] |
Matches if field is not null and exists. |
field IS EMPTY or field = [] |
{"is_empty": {"key": "field"}} |
Matches if field is missing, null , or an empty array. |
COUNT(field) >= value |
{"key": "field", "values_count": {"gte": value}} |
Filters by the number of values in an array field. |
parent.child = 'value' |
{"key": "parent.child", "match": {"value": "value"}} |
Accesses nested payload fields using dot notation. |
array[].field = 'value' |
{"key": "array[].field", "match": {"value": "value"}} |
Filters on elements within an array of objects. |
id = 123 or id = 'uuid_str' |
{"has_id": [123]} or {"has_id": ["uuid_str"]} |
Filters by point ID (integer or UUID). |
date_field >= 'YYYY-MM-DDTHH:MM:SSZ' |
{"key": "date_field", "range": {"gte": "YYYY-MM-DDTHH:MM:SSZ"}} |
Datetime range filtering. |
date_field BETWEEN 'date1' AND 'date2' |
{"key": "date_field", "range": {"gte": "date1", "lte": "date2"}} |
Inclusive range for datetimes. |
(cond1 OR cond2) AND NOT (cond3 OR cond4) |
must: [should: [cond1, cond2]], must_not: [should: [cond3, cond4]] |
Complex logical combinations using nested filters. |
Implement HasVectorCondition
, NestedCondition
and Geo (GeoBoundingBox
, GeoRadius
and GeoPolygon
).
We envision Qdrant VSQL evolving into a full-fledged SQL interface for Qdrant, allowing you to interact with your vector database using familiar SQL syntax for various operations. This idea was first discussed in the Qdrant GitHub issue #4026. This could simplify queries for the Qdrant dashboard, Jupyter Notebooks, Advanced Filtering Support in UIs, and more.
Here are some examples of what we aim to support:
SELECT score, vector, id, payload.brand_name
FROM mycollection
WHERE vector LIKE [0.12, 0.1, 0.99, -0.01]
SELECT score, vectors.img_vec, id, payload.qty_items
FROM mycollection
WHERE vectors.img_vec LIKE [0, 1.2, -0.2, 0.001]
SELECT score, id, payload.qty_items
FROM mycollection
WHERE vector LIKE [0.1, 0.2, -0.3, 0.11] AND payload.brand_name = 'Nokia'
LIMIT 50
OFFSET 20
SELECT score, id, payload.*
FROM mycollection
WHERE vector LIKE [0.11, -0.2, 0.3, 0.22] AND payload.brand_name = 'Nokia'
LIMIT 100
SELECT score, id
FROM mycollection
WHERE vector LIKE [0.01, -0.9, 0.11, 0.0]
AND (payload.brand_name IN ('Nokia', 'Alcatel', 'Sony') OR payload.qty_items >= 10)
AND payload.members IS NULL
LIMIT 100
SELECT id, payload.*
FROM mycollection
WHERE payload.brand_name MATCH 'cell'
AND COUNT(payload.members) > 100
AND payload.qty_items BETWEEN 10 AND 20
ORDER BY payload.members DESC
LIMIT 100
SELECT score, id
FROM mycollection
WHERE vector LIKE [0.01, -0.9, 0.11, 0.0]
AND (
payload.geofield INSIDE RECTANGLE(52.520711, 13.403683, 52.495862, 13.455868)
OR
payload.geofield INSIDE CIRCLE(52.520711, 13.403683, 1000)
OR
payload.geofield OUTSIDE POLYGON([12.444, 54.12], [24.77, 18.222], [99.91, 12.2])
)
We welcome contributions! If you have suggestions for improvements, new features, or bug fixes, please feel free to:
- Fork the repository.
- Clone your forked repository.
- Create a new branch for your feature or bug fix.
- Make your changes and ensure tests pass.
- Commit your changes with a clear message.
- Push your branch to your forked repository.
- Open a Pull Request to the
main
branch of this repository.
Please refer to our CONTRIBUTING.md
for more detailed guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
Made with β€οΈ for the community