-
Notifications
You must be signed in to change notification settings - Fork 221
Don't write code during size computation #547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be better to rebase the second patch on top of master to expedite the fix though, since the first patch is unlikely to be merged AS-IS.
BTW, I didn't notice this fatal error when reviewing first, but we might need to clarify the expectations of that "review" or risk bigger issues later.
I think the first patch is closer to landing, since all requests were done. This one, however, is not necessary in its best form. Maybe we could do it in a nicer way, and testing is also a question. The new asserts do some testing, but is it enough? |
Ouch. During the It appeared that other places in the compiler were unconditionally writing data out, and I'd uneasily just accepted that. Does it allocate an up-front amount of space equal to the size of the parsed input, so that if you have 10 META codes, there'll be space for 10 OP codes, and you only need to make the OP write conditional for cases where the compiler inflates the output? PCRE2 triggers tons of lint-rule violations in Microsoft's in-house linters, due to widespread use of writes to pointers that don't have attached length information (for example, not checking each and every write to
Sounds good to just write a big test. 3000 characters isn't going to break the test suite, I hope? |
Luckily, no. There is a block of storage available, but the the idea is that after each "item" is written and its length ascertained, the pointer is reset to the start of the block. I forget the details offhand, but it is supposed to be clever enough to stay within the allotted memory. (It's over a decade since I implemented this, probably nearer two, so the details are no longer in mind. I'd have to read the code to refresh....) |
The idea is that the majority of opcodes have a fixed size. PCRE2 uses a stack allocated buffer, where you can write them during length computation. Then https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_compile.c#L10210 |
Patch is updated. I still don't like it. It has no test, and maybe LINK_SIZE checks needed as well. The |
Thank you @zherczeg and @PhilipHazel for explaining. That makes good sense, and I see the need to fix it for ECLASS, which does indeed write an unbounded amount of data to its opcode. |
I am thinking about a possible test. It would be a huge test with a repeated character sequence. Since the asserts check the reset mechanic, it would not add more to the existing tests. The I have no better idea to fix this, but I would be happy if future code changes would make this code nicer as a side effect. |
Agreed. The fix looks good, but I'll try and reduce the complexity when I optimize away the recursive descent. We should have tests of the form: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve, but ideally should add a test before merging
src/pcre2_compile_class.c
Outdated
first = FALSE; | ||
} | ||
|
||
DONE: | ||
PCRE2_ASSERT(!first); /* Confirm that we found something. */ | ||
PCRE2_ASSERT(lengthptr == NULL || | ||
(code == code_start && *pcode == code_start)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added an extra condition to the assertion here - *pcode == code_start
, which the other assertion doesn't have. I don't think it's needed, since there's literally no code in the function which touches pcode
.
Test added. Without hte patch, it fails with: |
Is there any changes needed for this patch? |
I think is ready and doubly "approved" |
This patch is in the top of #546
When I worked on the other patch, I was thinking that writing unknown amount of data when the length is computed can be an issue. It looks like fuzzers already caught this issue, so here is a patch. Can be combined with #546, but can be independent form it.
Adding a test is difficult because
COMPILE_WORK_SIZE
is(3000*LINK_SIZE)
and a really huge test is needed. What shall we do? Add a really huge test? E.g./[\x{200}&&\x{200}&&....\x{200}]/alt_extended_class