A new external search solution #1704

ang-zeyu · 2021-12-14T14:19:47Z

Hi folks,

Not sure if this is the right place to post this; feel free to close this issue.

I've just published a little pet project I have been working on this year.

It consists of a cli file indexer, a search library powered by wasm (rust / typescript), and search ui (typescript).

There are 2 main motivations behind this project, to sum it up - more scalability and a complete (indexer -> up till ui) pre built index search solution.

Scalability
My original motivation in creating this project was primarily this issue https://github.com/olivernn/lunr.js/issues/222; Most (if not all, afaik) existing client side search libraries are only able to generate a prebuilt index that is monolithic. This has obvious implications when your collection scales (i.e. you'll need to start considering a search SaaS / server), because it becomes near impossible to download the entire index in a timely manner. (also, memory usage implications)

The primary approach / difference here as such is providing the option of fragmenting the index into many separate files; At search time, only files needed (by what's searched) are retrieved. There are varying levels of scalability vs file bloat vs response time that can be configured.

Use of WebAssembly
I also wanted to see how far I could push the boundaries with compression schemes as such, and therefore pivoted to WebAssembly. The entire thing was built in pure typescript originally, which is quite measurably slower for low level byte-wise processing (not an apples to apples comparison but worth mentioning indexer speed also went from 10min -> 10s =P).

There were several other reasons, mostly related to more "fancy" features that I wanted to implement efficiently, for example:
- query term proximity ranking
- phrase queries
- get my hands wet with wasm (my first project with it =P)
A complete, "offline" prebuilt index search solution
The secondary motivation quite simply, is providing a complete (indexer -> search library -> ui) search solution / replacement that can built into other software (without something like algolia docsearch which isn't always an option).

Back to why I'm posting this here

I've created a mdbook plugin that is basically a replacement for the search function here. It's built on top of the cli indexer and search ui.

It does a few extras vs the generic library:

theming (using mdbook's css variables)
automatic scaling: there are various ways to configure the indexer / search in terms of scalability, file bloat, and response time (see here if interested). The preprocessor detects the collection size using a simple ch.content.len() summation and adjusts these settings accordingly.
replicate the "navigate searched terms" behaviour partially (quite literally just ctrl-c+v ed the doSearchOrMarkFromUrl function here for now =X)

I'd like to think of this plugin as just a proof of concept at this point for the cli indexer and search ui as well.
Some small things aren't quite on par with the implementation here yet, but should be straightforward to add in coming iterations:

no option for breadcrumbs, yet (replaced with title -> heading -> body)
keyboard event integration isn't quite there yet (e.g. escape)
no search icon (the search bar is always there) - still trying to find a way to add this in within a preprocessor
it depends on some existing implementations within mdbook (the search css theme variables mostly), but should be straightforward to pull over completely as well.
the indexer cli tool has to be installed separately. I might find a way to build this in shortly.

Some other pluses this tool offers vs the default:

typo tolerance
phrase, boolean, queries
a few more I'll leave to the docs 😁

There are some obvious general downsides I would highlight as well,:

❗ use of wasm -- no IE support (I might look into this in the future, but I think its not quite worth it given the daily decreasing IE usage)
- I also don't see this replacing mdbook's main search feature as such, at least, as long as IE is a supported target.
no client side indexing (unrelated to mdbook, but more generally as a client side search library), this is not within scope
❔

Would love to hear your thoughts!

The text was updated successfully, but these errors were encountered:

ang-zeyu · 2021-12-14T21:33:09Z

Closing this after all as there's nothing actionable (in case anyone would like to completely decouple / separate search from mdbook) given the IE limitation. =X

ang-zeyu closed this as completed Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new external search solution #1704

A new external search solution #1704

ang-zeyu commented Dec 14, 2021 •

edited

Loading

ang-zeyu commented Dec 14, 2021

A new external search solution #1704

A new external search solution #1704

Comments

ang-zeyu commented Dec 14, 2021 • edited Loading

It consists of a cli file indexer, a search library powered by wasm (rust / typescript), and search ui (typescript).

Back to why I'm posting this here

There are some obvious general downsides I would highlight as well,:

ang-zeyu commented Dec 14, 2021

ang-zeyu commented Dec 14, 2021 •

edited

Loading