(TS) Fix PGVector implementation, where vector distance was inverted.#4944
Conversation
…tter. pgvector's `<=>` returns cosine distance, where lower is better. Convert it back into a bounded similarity score before returning it.
205ba19 to
81ccfe9
Compare
|
please sign the cla and remove the comment from the code to maintain consistency. |
|
The CLA is signed and comment removed. |
|
can add some test to verify this claim as true please thanks! |
…r) gets turned into score (higher is better) for ranking.
|
I've added a unit test. It mocks the postgres response and confirms the results, with the old code the test fails and with the fix it succeeds. Since there's no setup for using testcontainers and it feels like a bit of scope creep to add it, it doesn't actually run the query against a postgres database to confirm the mocked response is what you get, but run the SQL in the PR body on any postgres instance to confirm the behavior. Hope this test is sufficient. |
|
@kartik-mem0 Anything else I can do to get this merged? |
please address this lint fix as our ci failed here @zegerhoogeboom |
|
@kartik-mem0 Sorry about that. Done. |

The hybrid search pipeline expects semantic scores where higher is better. pgvector's
<=>returns cosine distance, where lower is better. Convert it back into a bounded similarity score before returning it.Linked Issue
No issue opened.
Description
While doing the semantic search, most similar documents have to be ranked highest. Using PGVector, the scores were inverted, surfacing the least relevant documents I suppose.
This fix actually returns the most similar documents.
Type of Change
Test Coverage
I ran .search() in my (proprietary) app, got very bad results, installed my local version with the fix and now the results are very relevant as expected.
Run this SQL to confirm the distance function in PGVector indeed behaves as I'm describing:
which outputs:
Checklist