Commit 4e1c033
authored
Query: Adds hybrid search query pipeline stage (#4794)
## Description
Adds hybrid search query pipeline stage. This requires the new Direct
package and gateway to be available in order to light up.
Given an input SQL such as:
```sql
SELECT TOP 100 c.text, c.abstract
FROM c
ORDER BY RANK RRF(FullTextScore(c.text, ['swim', 'run']), FullTextScore(c.abstract, ['energy']))
```
The new query plan (encoded below as XML instead of JSON to help
readability) is as follows:
```
<queryRanges>
<Item>{"min":[],"max":"Infinity","isMinInclusive":true,"isMaxInclusive":false}</Item>
</queryRanges>
<hybridSearchQueryInfo>
<globalStatisticsQuery><![CDATA[
SELECT
COUNT(1) AS documentCount,
[
{
totalWordCount: SUM(_FullTextWordCount(c.text)),
hitCounts: [
COUNTIF(FullTextContains(c.text, "swim")),
COUNTIF(FullTextContains(c.text, "run"))
]
},
{
totalWordCount: SUM(_FullTextWordCount(c.abstract)),
hitCounts: [
COUNTIF(FullTextContains(c.abstract, "energy"))
]
}
] AS fullTextStatistics
FROM c
]]></globalStatisticsQuery>
<componentQueryInfos>
<Item>
<distinctType>None</distinctType>
<top>200</top>
<orderBy>
<Item>Descending</Item>
</orderBy>
<orderByExpressions>
<Item>_FullTextScore(c.text, ["swim", "run"], {documentdb-formattablehybridsearchquery-totaldocumentcount}, {documentdb-formattablehybridsearchquery-totalwordcount-0}, {documentdb-formattablehybridsearchquery-hitcountsarray-0})</Item>
</orderByExpressions>
<hasSelectValue>false</hasSelectValue>
<rewrittenQuery><![CDATA[
SELECT TOP 200
c._rid,
[
{
item: _FullTextScore(c.text, ["swim", "run"], {documentdb-formattablehybridsearchquery-totaldocumentcount}, {documentdb-formattablehybridsearchquery-totalwordcount-0}, {documentdb-formattablehybridsearchquery-hitcountsarray-0})
}
] AS orderByItems,
{
payload: {
text: c.text,
abstract: c.abstract
},
componentScores: [
_FullTextScore(c.text, ["swim", "run"], {documentdb-formattablehybridsearchquery-totaldocumentcount}, {documentdb-formattablehybridsearchquery-totalwordcount-0}, {documentdb-formattablehybridsearchquery-hitcountsarray-0}),
_FullTextScore(c.abstract, ["energy"], {documentdb-formattablehybridsearchquery-totaldocumentcount}, {documentdb-formattablehybridsearchquery-totalwordcount-1}, {documentdb-formattablehybridsearchquery-hitcountsarray-1})
]
} AS payload
FROM c
WHERE {documentdb-formattableorderbyquery-filter}
ORDER BY _FullTextScore(c.text, ["swim", "run"], {documentdb-formattablehybridsearchquery-totaldocumentcount}, {documentdb-formattablehybridsearchquery-totalwordcount-0}, {documentdb-formattablehybridsearchquery-hitcountsarray-0}) DESC
]]></rewrittenQuery>
<hasNonStreamingOrderBy>true</hasNonStreamingOrderBy>
</Item>
<Item>
<distinctType>None</distinctType>
<top>200</top>
<orderBy>
<Item>Descending</Item>
</orderBy>
<orderByExpressions>
<Item>_FullTextScore(c.abstract, ["energy"], {documentdb-formattablehybridsearchquery-totaldocumentcount}, {documentdb-formattablehybridsearchquery-totalwordcount-1}, {documentdb-formattablehybridsearchquery-hitcountsarray-1})</Item>
</orderByExpressions>
<hasSelectValue>false</hasSelectValue>
<rewrittenQuery><![CDATA[
SELECT TOP 200
c._rid,
[
{
item: _FullTextScore(c.abstract, ["energy"], {documentdb-formattablehybridsearchquery-totaldocumentcount}, {documentdb-formattablehybridsearchquery-totalwordcount-1}, {documentdb-formattablehybridsearchquery-hitcountsarray-1})
}
] AS orderByItems,
{
payload: {
text: c.text,
abstract: c.abstract
},
componentScores: [
_FullTextScore(c.text, ["swim", "run"], {documentdb-formattablehybridsearchquery-totaldocumentcount}, {documentdb-formattablehybridsearchquery-totalwordcount-0}, {documentdb-formattablehybridsearchquery-hitcountsarray-0}),
_FullTextScore(c.abstract, ["energy"], {documentdb-formattablehybridsearchquery-totaldocumentcount}, {documentdb-formattablehybridsearchquery-totalwordcount-1}, {documentdb-formattablehybridsearchquery-hitcountsarray-1})
]
} AS payload
FROM c
WHERE {documentdb-formattableorderbyquery-filter}
ORDER BY _FullTextScore(c.abstract, ["energy"], {documentdb-formattablehybridsearchquery-totaldocumentcount}, {documentdb-formattablehybridsearchquery-totalwordcount-1}, {documentdb-formattablehybridsearchquery-hitcountsarray-1}) DESC
]]></rewrittenQuery>
<hasNonStreamingOrderBy>true</hasNonStreamingOrderBy>
</Item>
</componentQueryInfos>
<take>100</take>
<requiresGlobalStatistics>true</requiresGlobalStatistics>
</hybridSearchQueryInfo>
```
We have a custom implementation for the global statistics inside the
`HybridSearchCrossPartitionQueryPipelineStage` because it uses nested
aggregates. Each of the component queries in the hybrid search query
plan is cross partition, and we run them using the existing cross
partition query pipelines.
Note the use of placeholders such as
`{documentdb-formattablehybridsearchquery-totaldocumentcount}` in the
query plan. These need to be replaced by the global statistics.
## Type of change
- [x] New feature (non-breaking change which adds functionality)1 parent 57c681f commit 4e1c033
File tree
21 files changed
+1721
-170
lines changed- Microsoft.Azure.Cosmos
- src/Query/Core
- Pipeline
- Aggregate/Aggregators
- CrossPartition
- HybridSearch
- OrderBy
- Distinct
- QueryPlan
- tests
- Microsoft.Azure.Cosmos.EmulatorTests
- Documents
- Query
- Microsoft.Azure.Cosmos.Performance.Tests/Query
- Microsoft.Azure.Cosmos.Tests
- Query/Pipeline
- Tracing
21 files changed
+1721
-170
lines changedLines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
341 | 341 | | |
342 | 342 | | |
343 | 343 | | |
344 | | - | |
| 344 | + | |
| 345 | + | |
345 | 346 | | |
346 | 347 | | |
347 | 348 | | |
| |||
Lines changed: 91 additions & 123 deletions
Large diffs are not rendered by default.
Lines changed: 64 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
Lines changed: 67 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
0 commit comments