-
Notifications
You must be signed in to change notification settings - Fork 595
Improve index construction performance with bitmaps #758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think this issue should be closed. The vocabulary subsets are now indexed as If we move back to indexing a precursor to the torch representation, it would reduce memory consumption substantially, but generation would have unacceptable performance. Worst-case performance estimate of 25x Runtime PerformanceIndexing as bitmap
Indexing as
Indexing as tensor
Size
Related DiscussionsFor this to work both in terms of both runtime and memory performance, we need a pure torch operation or CUDA kernel implementation of bitmap compression. These are discussions related to implementing this:
Please re-open if I'm missing something. |
What behavior of the library made you think about the improvement?
One of the internal improvements we have at .txt involves a significantly more efficient representation of the vocabulary subsets stored in the index. By replacing the use of standard
set
s with something like https://pypi.python.org/pypi/roaringbitmap,outlines
could have the same improvements.How would you like it to behave?
No response
The text was updated successfully, but these errors were encountered: