-
-
Notifications
You must be signed in to change notification settings - Fork 9
read coding tables in one go if there is sufficient input #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
|
| for t in 0..nGroups { | ||
| let mut curr = GET_BITS!(strm, s, 5); | ||
| for i in 0..alphaSize { | ||
| loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be possible to bound this loop (I think it's 20 bits of code at most, so 40bits total?). But it's hard to be confident
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fuzzed this for many hours using this size of 20. So it holds at least for files encoded using the current version (which uses 17 bits at most, while earlier bzip2 versions used up to 20 bits).
1c3ddbd to
920297e
Compare
|
|
||
| s.len[usize::from(t)][usize::from(i)] = curr as u8; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What code is this block based on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this bit of C code
Lines 322 to 335 in d5d181b
| /*--- Now the coding tables ---*/ | |
| for (t = 0; t < nGroups; t++) { | |
| GET_BITS(BZ_X_CODING_1, curr, 5); | |
| for (i = 0; i < alphaSize; i++) { | |
| while (True) { | |
| if (curr < 1 || curr > 20) RETURN(BZ_DATA_ERROR); | |
| GET_BIT(BZ_X_CODING_2, uc); | |
| if (uc == 0) break; | |
| GET_BIT(BZ_X_CODING_3, uc); | |
| if (uc == 0) curr++; else curr--; | |
| } | |
| s->len[t][i] = curr; | |
| } | |
| } |
it's all scattered around in our implementation here, but if we assume sufficient input, we can basically copy the C code.
especially for smaller files this is an improvement
(CI benchmarks just show noise...)