Improve index construction performance with bitmaps #758

brandonwillard · 2024-03-18T19:39:22Z

What behavior of the library made you think about the improvement?

One of the internal improvements we have at .txt involves a significantly more efficient representation of the vocabulary subsets stored in the index. By replacing the use of standard sets with something like https://pypi.python.org/pypi/roaringbitmap, outlines could have the same improvements.

How would you like it to behave?

No response

The text was updated successfully, but these errors were encountered:

lapp0 · 2024-07-31T13:54:04Z

I think this issue should be closed.

The vocabulary subsets are now indexed as torch.Tensor, which allows for runtime performance 50x faster than indexing as List[int] (per #1013). If the vocabulary subsets were indexed as any other type, an intermediary step would be necessary to convert to torch.Tensor for mask construction.

If we move back to indexing a precursor to the torch representation, it would reduce memory consumption substantially, but generation would have unacceptable performance. Worst-case performance estimate of 25x main's implementation (based on an assumption of 30,000 tokens in vocabulary, 15,000 being legal).

Runtime Performance

Indexing as bitmap

>>> num_tokens = 100000
>>> tot_time = 0
>>> legal_tokens_bm = RoaringBitmap(set(range(num_tokens)))
>>> logits = torch.rand(num_tokens, dtype=torch.float)
>>> for _ in range(1000):
...     start = time.time()
...     mask = torch.full_like(logits, -math.inf)
...     logits[legal_tokens_bm] = mask[legal_tokens_bm]
...     tot_time += time.time() - start
... 
>>> tot_time
39.47099304199219

Indexing as List[int]

>>> num_tokens = 100000
>>> tot_time = 0
>>> legal_tokens_list = list(range(num_tokens))
>>> logits = torch.rand(num_tokens, dtype=torch.float)
>>> for _ in range(1000):
...     start = time.time()
...     mask = torch.full_like(logits, -math.inf)
...     logits[legal_tokens_list] = mask[legal_tokens_list]
...     tot_time += time.time() - start
>>> tot_time
20.02087378501892

Indexing as tensor

>>> num_tokens = 100000
>>> tot_time = 0
>>> legal_tokens_tensor = torch.tensor(range(num_tokens))
>>> logits = torch.rand(num_tokens, dtype=torch.float)
>>> for _ in range(1000):
...     start = time.time()
...     mask = torch.full_like(logits, -math.inf)
...     logits[legal_tokens_tensor] = mask[legal_tokens_tensor]
...     tot_time += time.time() - start
... 
>>> tot_time
0.06929612159729004

Size

>>> sys.getsizeof(RoaringBitmap(set(range(num_tokens))))
8228
>>> sys.getsizeof(list(range(num_tokens)))
800056
>>> t = torch.tensor(range(num_tokens)); t.element_size() * t.nelement()
800000

Related Discussions

For this to work both in terms of both runtime and memory performance, we need a pure torch operation or CUDA kernel implementation of bitmap compression. These are discussions related to implementing this:

Please re-open if I'm missing something.

brandonwillard added enhancement optimization Related to performance optimizations structured generation Linked to structured generation help wanted labels Mar 18, 2024

rlouf mentioned this issue Apr 10, 2024

Accelerate the index construction process #795

Closed

rlouf pinned this issue Apr 12, 2024

rlouf added this to Improve Outlines May 5, 2024

rlouf moved this to Todo in Improve Outlines May 5, 2024

rlouf unpinned this issue Jun 11, 2024

lapp0 closed this as completed Jul 31, 2024

github-project-automation bot moved this from Todo to Done in Improve Outlines Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve index construction performance with bitmaps #758

Improve index construction performance with bitmaps #758

brandonwillard commented Mar 18, 2024

lapp0 commented Jul 31, 2024 •

edited

Loading

Uh oh!

Improve index construction performance with bitmaps #758

Improve index construction performance with bitmaps #758

Comments

brandonwillard commented Mar 18, 2024

What behavior of the library made you think about the improvement?

How would you like it to behave?

lapp0 commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Runtime Performance

Size

Related Discussions

Uh oh!

lapp0 commented Jul 31, 2024 •

edited

Loading