-
Notifications
You must be signed in to change notification settings - Fork 5
Description
What?
Solr 8 (since Solr 5) has a documented bug in Lucene that, on the presence on a special character in a quoted phrase, triggers the use of a SpanQuery (internally) generating un unawareness of the real offset (gaps) of words that impedes a phrase that exist in the right order of tokens in the index to match. Basically complex ADO labels and whole phrases when send to Solr via the Lucene parser (no slope) will not match.
The solution is to move to a Lucene that has the patch which is what is "newest" right now, Lucene 9, sol Solr 9.
apache/lucene@98dafe2
The actual implications of migrating to Solr 9 imply solrconfig, schema, types and OCR plugin changes but will be dealt on the new release on archipelago-deployment and deployment-live (tested and works very well) but for now, we need to make code compatible with 8 and 9 too.
9 uses the Unified Highlight component by default. Because Drupal treats (and exposes via the UI) all Full Text Search API fields a "group of things that are all equal" unified will fail in any of these does not contain the field properties to store offsets and vector positions at all. But not just fail, basically give a Java alert and die. So the idea here is to force the default (original) highlight component which is the default in 8 everywhere we are in charge of Highlights.
So:
First, make all this play Solr 9. I will for original highlighter to avoid unexpected issues like NULL POINTERS and classes that can not be cast into others from Solr (new version already found those).
Second. Parse, treat keys coming from a phrase v/s individual terms differently. I already build this which can dissect direct queries into keywords
| protected function getKeywordsParseModeAware(QueryInterface $query, string $parse_mode_id) { |
But it calls an inherited method
$this->flattenKeysArray($keys); that kills phrases. So I need to override itThird: on a highlight return, remove all HTML (so don't use the original highlight) IF at least one of the keys was a phrase (smart, less over processing for the normal
cute cats queries people will do, then apply links and highlights over those manuallyFourth. no fourth.
@alliomeria this is what I promised. Hope this makes sense