Skip to content

Missing expected results in fuzzy search (no stemming) #375

@lucaong

Description

@lucaong

Performing fuzzy search seems to miss some words within the given edit distance.
Here is one example (disabling stemming and all other pipeline functions to ensure that we are only observing the behavior of fuzzy search):

const l = lunr(function () {
  this.field('txt')
  this.pipeline.remove(lunr.stemmer)
  this.pipeline.remove(lunr.trimmer)
  this.pipeline.remove(lunr.stopWordFilter)
  this.searchPipeline.remove(lunr.stemmer)
  this.searchPipeline.remove(lunr.trimmer)
  this.searchPipeline.remove(lunr.stopWordFilter)

  ;[
    { id: 1, txt: 'coscienza' },
    { id: 2, txt: 'scienza' },
    { id: 3, txt: 'conoscienza' },
    { id: 4, txt: 'coscienzaxx' },
  ].forEach(line => this.add(line))
})

l.search('coscienza~2')
// => [ { ref: '3', score: ... }, { ref: '1', score: ... } ]

In the example above, I would expect the words scienza and coscienzaxx to also match, as they are at edit distance of 2 from the query term coscienza (two deletions or insertions at the word boundary).

This is also visible if one observes the fuzzy TokenSet expansion for the term coscienza:

lunr.TokenSet.fromFuzzyString("coscienza", 2).toArray()
// => results contains `*scienza` and `coscienza`, but not `scienza` or `coscienza**`
// (in the context of fuzzy search the * token is not linked to itself, so it matches exactly 1 character)

I am not sure if this is a bug or the intended behavior of fuzzy search. In the latter case, maybe it would deserve a mention in the documentation.

Thanks again for the great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions