Skip to content

regex: check the platform name for longer than expected values#349

Draft
carenas wants to merge 4 commits into
zherczeg:masterfrom
carenas:push-zupmqwpqmrox
Draft

regex: check the platform name for longer than expected values#349
carenas wants to merge 4 commits into
zherczeg:masterfrom
carenas:push-zupmqwpqmrox

Conversation

@carenas

@carenas carenas commented Oct 27, 2025

Copy link
Copy Markdown
Contributor

DO NOT MERGE

Additional changes to validate platform name "API" from PCRE2

@carenas carenas force-pushed the push-zupmqwpqmrox branch 2 times, most recently from c18df0a to 37a4030 Compare October 28, 2025 05:01
Comment thread sljit_src/sljitNativeARM_64.c
Comment thread regex_src/regexMain.c
return 1;
}

if (strlen(platform_name) >= PLATFORM_NAME_BUFFERSIZE_LIMIT) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a char*, why should it have a limit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that this is the PCRE2 specific test, and at least there it is assumed that the buffer has a "max expected" size that is part of the "API".

might make more sense once I push the PCRE2 side as well.

here I was "recognizing" that PCRE2 makes several assumptions from its dependencies (including sljit) that are not being checked consistently, and this is IMHO how we could do it.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow that part, but why PCRE2 has any limit on this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in PCRE2 there are several limits that are documented as max possible values and used in statically allocated buffers, this is one of them; see PCRE2Project/pcre2#834 for the PCRE2 side.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not happy about buffer limits. The sljit simply wants to return with a const char*, which can be printed on analyzed. There should be no limit for that. Could we somehow avoid this limit?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit is somewhat vaguely documented. But I would vote for updating that documentation to say, "applications should query dynamically for the required buffer size". I don't think we need a new API.

There's just one RISC-V CPU that exceeds the buffer size that was suggested (but not promised) by the previous docs. And we have no evidence that customer applications have actually run into this issue, apart from pcre2test itself which crashes because it doesn't use a big enough buffer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from pcre2test itself which crashes because it doesn't use a big enough buffer.

while pcre2test could crash in some cases, the failure was really planed as a way to prevent a possible overflow like this to go unnoticed and indeed worked as expected.

I would argue though that while it makes sense for pcre2_config() to work like this historically, it is difficult to defend its current use in a world of secure programming languages, considering that it can be abused so easily and it is fragile, so creating a new API might be worth IMHO.

I also think that the current mode of returning a PCRE2_SPTR through the where void * and that was introduced in the first release of PCRE2 doesn't seem to be very useful and could encourage the use of static buffers because of the alignment issues, so the new API probably should go back at producing a plain char * by default that was part of PCRE1.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no alignment issues to passing a buffer from malloc() or a static buffer PCRE2_UCHAR buf[...].

It is not great, in hindsight, that pcre2_config() lacks a buffer-size argument. Passing in a writable buffer without specifying its size is not modern. However, the API works fine if you call it to discover the required buffer size, allocate, then call again. It's not the most modern pattern, but it's not vulnerable to anything.

I would potentially be willing to make another version of pcre2_config with an extra argument... but the benefit is very minor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would potentially be willing to make another version of pcre2_config with an extra argument... but the benefit is very minor.

I was thinking it will be better to create a new API specially designed to extract these "strings" and make sure that it includes a parameter for buffersize, maybe pcre2_strlcpy()? and eventually deprecate the pcre2_config() that use a void * to return strings in favor of it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, a pcre2_config_str(int what, PCRE2_UCHAR *buf, PCRE2_SIZE bufsize) function.

But it's not strictly necessary, since the current API does work, if callers just call it twice, passing NULL to check they have a big enough buffer.

@carenas carenas changed the title check the platform name for longer than expected values regex: check the platform name for longer than expected values Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants