Skip to content

raulcarlomagno/qdrant_vsql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Qdrant VSQL

Python Version License Build Tests Code Style PyPI Version PyPI Downloads

πŸš€ Overview

Qdrant VSQL (VectorSQL) is an ambitious Python library aiming to bring a SQL-like query interface to Qdrant vector databases. Currently, it provides a robust solution for converting SQL-like WHERE clauses into Qdrant Filter objects, simplifying complex filtering logic.

Maybe Qdrant VSQL will evolve into a comprehensive SQL interface for Qdrant, potentially becoming a standard for vector database interactions. Imagine writing intuitive SQL queries for all your Qdrant operations, from filtering to vector similarity search, aggregation, and more!

✨ Current Features (Filtering)

  • SQL-like Syntax: Write familiar WHERE clauses (e.g., age > 30 AND city = 'New York').
  • Comprehensive Operator Support: Includes =, !=, <>, >, >=, <, <=, IN, NOT IN, BETWEEN, NOT BETWEEN, LIKE, IS NULL, IS NOT NULL, IS EMPTY, COUNT().
  • Logical Operators: Supports AND, OR, and NOT for complex query combinations. Supports intricate logical structures with nested operations. See more examples in the test suite.
  • Nested Field Support: Easily filter on nested payload fields using dot notation (e.g., user.address.city).
  • Array Projection: Handle array fields with [] notation (e.g., tags[] = 'urgent').
  • Type Handling: Automatically converts string, number, and boolean values to appropriate Qdrant types.
  • Qdrant Native Output: Generates qdrant_client.http.models.Filter objects, ready for use with the Qdrant client.
  • ID Filtering: Automatically translates all id field operations (e.g., id = 123) into Qdrant's native HasIdCondition for efficient point ID filtering.
  • LIKE Operator: When using the LIKE operator, it automatically translates to Qdrant's MatchText condition for full-text search capabilities. Otherwise it uses MatchValue for strings.
  • Case-Insensitive: Operators and keywords are case-insensitive for flexible query writing.

Installation

You can install Qdrant VSQL directly from PyPI:

pip install qdrant-vsql

πŸ’‘ Usage Examples (Current Filtering)

Here's how you can use Qdrant VSQL to build your Qdrant filter queries:

from qdrant_client import QdrantClient
from qdrant_vsql.filtering import where2filter

# Initialize Qdrant client (replace with your actual client setup)
qdrant_client = QdrantClient(":memory:")

# Example 1: Simple AND condition
query_str_1 = "city = 'London' AND age > 30"
qdrant_filter_1 = where2filter(query_str_1)

# Example 2: OR and IN conditions
query_str_2 = "status IN ('active', 'pending') OR priority = 'high'"
qdrant_filter_2 = where2filter(query_str_2)

# Example 3: NOT and BETWEEN conditions
query_str_3 = "NOT (price BETWEEN 100 AND 200) AND category IS NOT NULL"
qdrant_filter_3 = where2filter(query_str_3)

# Example 4: COUNT and LIKE conditions
query_str_4 = "COUNT(tags) >= 2 AND description LIKE 'product'"
qdrant_filter_4 = where2filter(query_str_4)

# Example 5: Datetime range condition
query_str_5 = "event_date BETWEEN '2023-01-01T00:00:00Z' AND '2023-12-31T23:59:59Z'"
qdrant_filter_5 = where2filter(query_str_5)

# Call Qdrant with your filter
hits = qdrant_client.query_points(
    collection_name="my_collection",
    query=np.random.rand(100),
    query_filter=qdrant_filter_1,
    limit=10
)

πŸ—ΊοΈ SQL to Qdrant Filter Equivalences

This section details the equivalences between common SQL-like WHERE clause syntax and the corresponding Qdrant Filter object structures. The where2filter function translates these SQL expressions into Qdrant's native filtering language, enabling powerful and flexible queries.

A Qdrant Filter object typically consists of must, should, and must_not arrays, which represent AND, OR, and NOT logical operations, respectively. Conditions within these arrays are combined to form complex queries.

SQL-like Syntax Qdrant Filter Object Equivalent (simplified) Notes
field = 'value' {"key": "field", "match": {"value": "value"}} Exact match for strings, numbers, booleans.
field != 'value' or field <> 'value' must_not: [{"key": "field", "match": {"value": "value"}}] Negation of exact match.
field IN ('val1', 'val2') {"key": "field", "match": {"any": ["val1", "val2"]}} Matches if field is any of the values.
field NOT IN ('val1', 'val2') must_not: [{"key": "field", "match": {"any": ["val1", "val2"]}}] Matches if field is none of the values.
field BETWEEN val1 AND val2 {"key": "field", "range": {"gte": val1, "lte": val2}} Inclusive range for numbers.
field NOT BETWEEN val1 AND val2 must_not: [{"key": "field", "range": {"gte": val1, "lte": val2}}] Negation of inclusive range.
field > value {"key": "field", "range": {"gt": value}} Greater than.
field >= value {"key": "field", "range": {"gte": value}} Greater than or equal.
field < value {"key": "field", "range": {"lt": value}} Less than.
field <= value {"key": "field", "range": {"lte": value}} Less than or equal.
field LIKE 'pattern' {"key": "field", "match": {"text": "pattern%"}} Full-text search.
field IS NULL {"is_null": {"key": "field"}} Matches if field is null or does not exist.
field IS NOT NULL must_not: [{"is_null": {"key": "field"}}] Matches if field is not null and exists.
field IS EMPTY or field = [] {"is_empty": {"key": "field"}} Matches if field is missing, null, or an empty array.
COUNT(field) >= value {"key": "field", "values_count": {"gte": value}} Filters by the number of values in an array field.
parent.child = 'value' {"key": "parent.child", "match": {"value": "value"}} Accesses nested payload fields using dot notation.
array[].field = 'value' {"key": "array[].field", "match": {"value": "value"}} Filters on elements within an array of objects.
id = 123 or id = 'uuid_str' {"has_id": [123]} or {"has_id": ["uuid_str"]} Filters by point ID (integer or UUID).
date_field >= 'YYYY-MM-DDTHH:MM:SSZ' {"key": "date_field", "range": {"gte": "YYYY-MM-DDTHH:MM:SSZ"}} Datetime range filtering.
date_field BETWEEN 'date1' AND 'date2' {"key": "date_field", "range": {"gte": "date1", "lte": "date2"}} Inclusive range for datetimes.
(cond1 OR cond2) AND NOT (cond3 OR cond4) must: [should: [cond1, cond2]], must_not: [should: [cond3, cond4]] Complex logical combinations using nested filters.

🚧 Pending Features

Implement HasVectorCondition, NestedCondition and Geo (GeoBoundingBox, GeoRadius and GeoPolygon).

🎯 Future Vision

We envision Qdrant VSQL evolving into a full-fledged SQL interface for Qdrant, allowing you to interact with your vector database using familiar SQL syntax for various operations. This idea was first discussed in the Qdrant GitHub issue #4026. This could simplify queries for the Qdrant dashboard, Jupyter Notebooks, Advanced Filtering Support in UIs, and more.

Here are some examples of what we aim to support:

Vector Similarity Search

SELECT score, vector, id, payload.brand_name
FROM mycollection
WHERE vector LIKE [0.12, 0.1, 0.99, -0.01]

For named vectors:

SELECT score, vectors.img_vec, id, payload.qty_items
FROM mycollection
WHERE vectors.img_vec LIKE [0, 1.2, -0.2, 0.001]

Filtering Payload with LIMIT and OFFSET

SELECT score, id, payload.qty_items
FROM mycollection
WHERE vector LIKE [0.1, 0.2, -0.3, 0.11] AND payload.brand_name = 'Nokia'
LIMIT 50
OFFSET 20

Selecting All Payload Fields

SELECT score, id, payload.*
FROM mycollection
WHERE vector LIKE [0.11, -0.2, 0.3, 0.22] AND payload.brand_name = 'Nokia'
LIMIT 100

Complex Filtering

SELECT score, id
FROM mycollection
WHERE vector LIKE [0.01, -0.9, 0.11, 0.0]
AND (payload.brand_name IN ('Nokia', 'Alcatel', 'Sony') OR payload.qty_items >= 10)
AND payload.members IS NULL
LIMIT 100

Scroll API with Full-text, Value Count, Range, and Sorting

SELECT id, payload.*
FROM mycollection
WHERE payload.brand_name MATCH 'cell'
AND COUNT(payload.members) > 100
AND payload.qty_items BETWEEN 10 AND 20
ORDER BY payload.members DESC
LIMIT 100

Geo-Spatial Filtering

SELECT score, id
FROM mycollection
WHERE vector LIKE [0.01, -0.9, 0.11, 0.0]
AND (
payload.geofield INSIDE RECTANGLE(52.520711, 13.403683, 52.495862, 13.455868)
OR
payload.geofield INSIDE CIRCLE(52.520711, 13.403683, 1000)
OR
payload.geofield OUTSIDE POLYGON([12.444, 54.12], [24.77, 18.222], [99.91, 12.2])
)

🀝 Contributing

We welcome contributions! If you have suggestions for improvements, new features, or bug fixes, please feel free to:

  1. Fork the repository.
  2. Clone your forked repository.
  3. Create a new branch for your feature or bug fix.
  4. Make your changes and ensure tests pass.
  5. Commit your changes with a clear message.
  6. Push your branch to your forked repository.
  7. Open a Pull Request to the main branch of this repository.

Please refer to our CONTRIBUTING.md for more detailed guidelines.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with ❀️ for the community