Skip to content

Improve index construction performance with bitmaps #758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brandonwillard opened this issue Mar 18, 2024 · 1 comment
Closed

Improve index construction performance with bitmaps #758

brandonwillard opened this issue Mar 18, 2024 · 1 comment
Labels
enhancement help wanted optimization Related to performance optimizations structured generation Linked to structured generation

Comments

@brandonwillard
Copy link
Member

What behavior of the library made you think about the improvement?

One of the internal improvements we have at .txt involves a significantly more efficient representation of the vocabulary subsets stored in the index. By replacing the use of standard sets with something like https://pypi.python.org/pypi/roaringbitmap, outlines could have the same improvements.

How would you like it to behave?

No response

@brandonwillard brandonwillard added enhancement optimization Related to performance optimizations structured generation Linked to structured generation help wanted labels Mar 18, 2024
@rlouf rlouf pinned this issue Apr 12, 2024
@rlouf rlouf moved this to Todo in Improve Outlines May 5, 2024
@rlouf rlouf unpinned this issue Jun 11, 2024
@lapp0
Copy link
Contributor

lapp0 commented Jul 31, 2024

I think this issue should be closed.

The vocabulary subsets are now indexed as torch.Tensor, which allows for runtime performance 50x faster than indexing as List[int] (per #1013). If the vocabulary subsets were indexed as any other type, an intermediary step would be necessary to convert to torch.Tensor for mask construction.

If we move back to indexing a precursor to the torch representation, it would reduce memory consumption substantially, but generation would have unacceptable performance. Worst-case performance estimate of 25x main's implementation (based on an assumption of 30,000 tokens in vocabulary, 15,000 being legal).

Runtime Performance

Indexing as bitmap

>>> num_tokens = 100000
>>> tot_time = 0
>>> legal_tokens_bm = RoaringBitmap(set(range(num_tokens)))
>>> logits = torch.rand(num_tokens, dtype=torch.float)
>>> for _ in range(1000):
...     start = time.time()
...     mask = torch.full_like(logits, -math.inf)
...     logits[legal_tokens_bm] = mask[legal_tokens_bm]
...     tot_time += time.time() - start
... 
>>> tot_time
39.47099304199219

Indexing as List[int]

>>> num_tokens = 100000
>>> tot_time = 0
>>> legal_tokens_list = list(range(num_tokens))
>>> logits = torch.rand(num_tokens, dtype=torch.float)
>>> for _ in range(1000):
...     start = time.time()
...     mask = torch.full_like(logits, -math.inf)
...     logits[legal_tokens_list] = mask[legal_tokens_list]
...     tot_time += time.time() - start
>>> tot_time
20.02087378501892

Indexing as tensor

>>> num_tokens = 100000
>>> tot_time = 0
>>> legal_tokens_tensor = torch.tensor(range(num_tokens))
>>> logits = torch.rand(num_tokens, dtype=torch.float)
>>> for _ in range(1000):
...     start = time.time()
...     mask = torch.full_like(logits, -math.inf)
...     logits[legal_tokens_tensor] = mask[legal_tokens_tensor]
...     tot_time += time.time() - start
... 
>>> tot_time
0.06929612159729004

Size

>>> sys.getsizeof(RoaringBitmap(set(range(num_tokens))))
8228
>>> sys.getsizeof(list(range(num_tokens)))
800056
>>> t = torch.tensor(range(num_tokens)); t.element_size() * t.nelement()
800000

Related Discussions

For this to work both in terms of both runtime and memory performance, we need a pure torch operation or CUDA kernel implementation of bitmap compression. These are discussions related to implementing this:

Please re-open if I'm missing something.

@lapp0 lapp0 closed this as completed Jul 31, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Improve Outlines Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted optimization Related to performance optimizations structured generation Linked to structured generation
Projects
None yet
Development

No branches or pull requests

2 participants