Skip to content
This repository was archived by the owner on Jan 2, 2025. It is now read-only.

Deduplicate after merging lexical results #1161

Merged
merged 2 commits into from
Dec 8, 2023

Conversation

ggordonhall
Copy link
Contributor

No description provided.

@rmuller-ml
Copy link
Contributor

MMR over the merged results will affect the RRF (hybrid) ranking, because MMR it is basically ranking by cosine distance + diversity term so the lexical results will go to the bottom.

What exactly we are trying to add to the lexical results? Filter overlapping chunks? File path diversity? Prog. language diversity?

@ggordonhall
Copy link
Contributor Author

ggordonhall commented Dec 8, 2023

MMR over the merged results will affect the RRF (hybrid) ranking, because MMR it is basically ranking by cosine distance + diversity term so the lexical results will go to the bottom.

What exactly we are trying to add to the lexical results? Filter overlapping chunks? File path diversity? Prog. language diversity?

We're trying to ensure that we have path and lang diversity in the final result list.

@ggordonhall ggordonhall merged commit 9f1b7b3 into main Dec 8, 2023
@ggordonhall ggordonhall deleted the gabriel/code-search-parity branch December 8, 2023 15:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants