Skip to content

search: very slow and inefficient index initialization #739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
evil-shrike opened this issue Apr 4, 2018 · 4 comments
Closed

search: very slow and inefficient index initialization #739

evil-shrike opened this issue Apr 4, 2018 · 4 comments
Labels
help wanted Contributions are especially encouraged

Comments

@evil-shrike
Copy link

Hi.
I have an index (search.js) with 35K rows (9MB). It takes more than 35 seconds to initialize. So user have to click in search field and wait for half a minute. Obviously it's ridiculous.
I looked up into the code where index is being initialized. It seems it's very inefficient.

        function createIndex() {
            index = new lunr.Index();
            index.pipeline.add(lunr.trimmer);
            index.field("name", {
                boost: 10
            });
            index.field("parent");
            index.ref("id");
            var rows = search.data.rows;
            var pos = 0;
            var length = rows.length;
            function batch() {
                var cycles = 0;
                while (cycles++ < 100) {
                    index.add(rows[pos]);
                    if (++pos == length) {
                        return setLoadingState(SearchLoadingState.Ready);
                    }
                }
                setTimeout(batch, 10);
            }
            batch();
        }

I understand that you're trying not to block ui thread splitting work on tasks being executed via setTimeout. But the end result is not good.
But besides that why do you add every rows via Index.add?
Lunr Index has load method - https://lunrjs.com/docs/lunr.Index.html which should be MUCH faster. All we need is just to build a proper serialized index during building stage instead of that custom structure with rows.

I have search on one of my static doc site with lurn, and 8MB lunr-index initialized pretty fast.

		$.getJSON("search_index.json", function (data) {
   			if (lunr.multiLanguage) {
				lunr.multiLanguage('en', 'ru');
			}
			engine = lunr.Index.load(data.lunrIndex);
			doSearch(engine, term, data.pageIndex, container);
		});

(pageIndex is a custom structure for mapping urls to titles)

@aciccarello
Copy link
Collaborator

Thanks for the report @evil-shrike. It'd be great to improve this. PRs are welcome 😄

@aciccarello aciccarello added enhancement help wanted Contributions are especially encouraged labels Apr 4, 2018
@evil-shrike
Copy link
Author

evil-shrike commented Apr 5, 2018

@aciccarello the problem is that search consists of two parts separated between different repos as I understand: JavascriptIndexPlugin plugin in typedoc (knows nothing about lunr) and assets\js\main.js in a theme (loads index into lunr Index). One of an additional problem is that every theme have to repeat the same logic for implementing search.
Changing theses both parts isn't something that can be made by a stranger as it'll break all themes.

Given the fact that search index is being created in a plugin (JavascriptIndexPlugin) I believe that this plugin should supply a client script for handling the created index. Other client script in themes should know only its API.
Something like this:

interface SearchResult {
    title: stirng; // page title where term was found
    text: stirng; // text where term was found
    url: string;  // page url to go
}
init(): Promise;
search (term: strung): Array<SearchResult>;

that search method can be available in some global object like typedoc.search (I don't know much here about client architecture).
it's pretty similar to the current state where plugin creates search.js, but code inside only assigns json object. Instead that script could do all dirty work for initialing and searching.

@evil-shrike
Copy link
Author

Also it's just a side note, let me suggest to look at docfx implementation of searching. It's really awesome as it builds lunr index in background (via WebWorker) - see https://github.com/dotnet/docfx/blob/dev/src/docfx.website.themes/default/styles/search-worker.js

@Gerrit0
Copy link
Collaborator

Gerrit0 commented Apr 6, 2020

With #1252 we build the index at documentation time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Contributions are especially encouraged
Projects
None yet
Development

No branches or pull requests

3 participants