Simple pattern blows out JIT stack #156

kohler · 2022-11-02T12:48:41Z

Hi! Thanks so much for this project.

The following relatively simple pattern has unexpected bad behavior on JIT:

/\A(?:[^\\')]|'(?:[^'\\]|\\[\S\s])*')*/s

On long strings, non-JIT PCRE performs much faster than JIT, and on an ~8192-character string, PCRE overflows the default JIT stack size.

Failing test available here: kohler@c142ad0

The text was updated successfully, but these errors were encountered:

kohler · 2022-11-02T12:51:48Z

This semantically equivalent regex seems to behave OK:

/\A(?:[^\\')]*(?:'(?:[^'\\]|\\[\S\s])*'|))*/s

PhilipHazel · 2022-11-03T16:18:51Z

Which release of PCRE? And what is the content of the strings you are matching? (Do they match or not match?) Using the current 10.40 release I do not see a slowdown with a string consisting of "abcde" repeated many times, but I do see the JIT stack overflow when the repeat is more than 272. The JIT maintainer may wish to comment.

kohler · 2022-11-03T16:29:57Z

Many releases of PCRE, including HEAD. The content matches. Please see this following commit, to PCRE2’s testdata/testinput1, for an example string in pcre2test format.

kohler@c142ad0

zherczeg · 2022-11-04T06:18:10Z

This looks like a * * test, where every character can be matched. The [\S\s] looks like a dotall. This requires a lot of stack data. Atomic blocks should help. Is there any optimization which optimizes this case in the interpreter? I can check this later.

PhilipHazel · 2022-11-04T16:31:46Z

I don't think there's anything special about this in the interpreter. (But my memory isn't what it used to be.)

kohler · 2022-11-04T16:46:12Z

I was surprised by this behavior mostly because this grammar doesn’t theoretically require any lookbehind or backtracking at all.

zherczeg · 2022-11-04T17:46:47Z

Regex engines have different engine types:
https://zherczeg.github.io/sljit/regex_compare.html

PCRE2 is more like a classic programming language, where you define the operations. For example, if you implement a bubble sort in C, and quicksort in JS, you cannot say C is slow just because JS is faster in this case.

kohler · 2022-11-05T01:44:37Z

@zherczeg, is there no special case in the backtracking engine for a pattern like (a<PAT1>|[^a]<PAT2>), where the two branches of the choice are disjoint, and the correct branch can be detected with one character of lookahead?

If there is no such special case currently, do you think adding one would be interesting, and do you have any advice about how to do so?

kohler · 2022-11-05T02:39:56Z

In particular, call the case I care about (A|B)*C. Here:

The match is greedy.
The first character in A cannot start B or C. (NB this includes the case when C is empty at the end of the pattern.)
Either A is fixed-length or the last character in A cannot start B or C.
The first character in B cannot start A or C.
Either B is fixed-length or the last character in B cannot start A or C.

Given this, as soon as the ith repetition of (A|B)* matches, it is safe to throw away the backtracking information associated with the 1st-ith repetitions of (A|B)*. The final match will either fail or start with ≥i repetitions of (A|B).

zherczeg · 2022-11-05T08:55:46Z

These are static compiler optimizations, and there is a project for that:
https://github.com/zherczeg/repan

If you implement your suggested optimization there, it can rewrite:
/\A(?:[^\\')]|'(?:[^'\\]|\\[\S\s])*')*/s -> /\A(?:[^\\')]|'(?:[^'\\]|\\[\S\s])*+')*+/s

The key thing: it does not matter which engine you use, it runs faster, because, as you said, the engine can throw away the backtracking info. The generic use case: you have your patter set, you use repan at compile time to generate their optimized version, and use them at runtime. In this case doing complex regex optimizations is basically free.

JetXujing · 2023-04-27T11:45:36Z

I have simplified the problem reproduction method. I can also reproduce the problem by using the following method.

[root@localhost ~]# cat re
/([^A]|B)*/
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[root@localhost ~]# pcre2test -jit re
PCRE2 version 10.39 2021-10-29
/([^A]|B)*/
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Failed: error -46: JIT stack limit reached

In addition, I found that when the string length is 1023, no error is reported, but when the string length is 1024, the error is reported. It looks like 1024 is the threshold.

zherczeg · 2023-04-27T14:19:52Z

The default stack is 32K, you should use the jit stack to allocate more. But the pattern is still inefficient. If you make a bubble sort in C, and it is slow, don't blame the compiler.

JetXujing · 2023-04-28T03:50:14Z

Yes, I tried to use the jitstack parameter and it didn't report an error.

[root@localhost ~]# pcre2test -jit re
PCRE2 version 10.39 2021-10-29
/([^A]|B)*/jitstack=1024
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPXX
 0: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPXX
 1: X

I tried the kohler/pcre2@c142ad0 case and added jitstack to the regular expression. No error was reported.
/\A(?:[^\\')]|'(?:[^'\\]|\\[\S\s])*')*/s,subject_literal,jitstack=8192

Can we consider adding a reminder log: the stack space used by default is not enough, consider using heap space through jitstack.

NWilson · 2025-02-27T14:42:56Z

Is part of the problem here that the JIT stack is fixed-size? The interpreter puts its backtracking data on the heap, and it can grow very very large before PCRE2 refuses to allocate any more memory.

But if I understand correctly: the JIT uses the stack for its backtracking data, and that stack has a size which is fixed up-front before matching begins. If the stack stores offsets rather than actual pointers, then you could grow the stack when its size limit is reaching (by doubling and copying the old one).

The default stack is 32K, you should use the jit stack to allocate more. But the pattern is still inefficient. If you make a bubble sort in C, and it is slow, don't blame the compiler.

The user's problem isn't that it's slow (although that was mentioned). Primarily, the bubble sort should complete rather than give up due to a 32K limit. We don't need to make the JIT fast for regexes that are inherently slow. But it should run to completion at least!

zherczeg · 2025-02-27T14:59:36Z

JIT can grow the stack to a user specified maximum. You can allocate a 1GB address space, and let JIT grow the currently allocated space.

NWilson · 2025-02-27T15:05:51Z

So why does the user complain about "overflowing the default stack"? Does it not grow by default?

zherczeg · 2025-02-27T17:25:56Z

By default it uses machine stack, and that cannot grow.

NWilson · 2025-02-28T20:48:02Z

I think I understand now. The missing piece of the puzzle, for me, is that I see that the "machine stack" implementation does not grow but just allocates 32K on the stack. I somehow thought that "uses the machine stack" would be bumping the actual stack pointer, and hence be growable up to several megabytes.

Your growable stack implementation works by reserving memory, and allocating pages within that reserved space, so you never have to actually move the stack into a fresh allocation.

Hmm.

Could there be some best-of-both default, where it starts with a stack which is allocated with a simple malloc (or block on the machine stack), but when that runs out, memcpies the data to a fresh buffer?

I sympathise with the user. When you use the interpreter, the out-of-the-box experience is not always fast but the regexes do run to completion. When you use the JIT, the same ought to be true.

zherczeg · 2025-03-01T07:04:07Z

JIT stack contains stack locations as absolute pointers. If you move the stack, you get a crash. Making them relative is possible. However, the stack base is not stored in a register, and allocating another is always complicated on systems with low number of registers. I also don't know its runtime costs.

It would be a bigger task to rework the code to use relative addresses, but definitely doable. Finding all locations in the code which stores stack pointer is not trivial.

Summary: a time consuming work, which might not worth at the end, so it has risks.

NWilson · 2025-03-01T21:06:05Z

I see. Various languages and JITs struggle with this. Many garbage collectors don't support compacting (moving) allocations, for example.

How about this for a compromise: if using the "machine stack" for the JIT, we could catch the error condition for stack exhaustion, and allocate a growable one automatically (growing up to the limit specified for the interpreter's heap allocations, perhaps), and then simply re-run the matching.

This is clearly much less efficient than magically moving the stack to a larger space and continuing matching. We'd also have to deallocate the JIT stack as well, rather than reuse it for subsequent match attempts on the same thread. However, it's an improvement if we assume that users would rather see their regexes return an answer!

ltrzesniewski · 2025-03-01T22:36:40Z

Note that "simply" re-running the matching will be visible when using callouts, and this could cause unwanted side-effects.

zherczeg · 2025-03-02T11:48:13Z

There is a question of what use case you want to solve. For a simple application, which runs 1-2 regex on 1-2 k text, the interpreter is good enough. As for complex applications with regex, the developers use the jit exec api with jit stack, and do the implementation in an afternoon. The point is: they have the control. They can decide which threads use how much (max) memory for regex. There are no hidden costs. They can also free the memory when unneeded.

NWilson · 2025-03-03T12:10:56Z

OK, I'm convinced! Thank you.

kohler mentioned this issue Nov 2, 2022

PCRE JIT produces wrong result on ~8192-byte strings php/php-src#9869

Open

SolitaryGrass mentioned this issue May 31, 2023

internal_dfa_match, a stack overflow occurred due to recursive calls. #258

Closed

addisoncrump mentioned this issue Nov 11, 2023

Large lookahead/behind range repetitions use excessive stack during JIT compilation #329

Closed

NWilson added the JIT Relating to the JIT feature label Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple pattern blows out JIT stack #156

Simple pattern blows out JIT stack #156

kohler commented Nov 2, 2022

kohler commented Nov 2, 2022

PhilipHazel commented Nov 3, 2022

kohler commented Nov 3, 2022

zherczeg commented Nov 4, 2022

PhilipHazel commented Nov 4, 2022

kohler commented Nov 4, 2022

zherczeg commented Nov 4, 2022

kohler commented Nov 5, 2022

kohler commented Nov 5, 2022 •

edited

Loading

zherczeg commented Nov 5, 2022

JetXujing commented Apr 27, 2023

zherczeg commented Apr 27, 2023

JetXujing commented Apr 28, 2023 •

edited

Loading

NWilson commented Feb 27, 2025

zherczeg commented Feb 27, 2025

NWilson commented Feb 27, 2025

zherczeg commented Feb 27, 2025

NWilson commented Feb 28, 2025

zherczeg commented Mar 1, 2025

NWilson commented Mar 1, 2025 •

edited

Loading

ltrzesniewski commented Mar 1, 2025

zherczeg commented Mar 2, 2025

NWilson commented Mar 3, 2025

Simple pattern blows out JIT stack #156

Simple pattern blows out JIT stack #156

Comments

kohler commented Nov 2, 2022

kohler commented Nov 2, 2022

PhilipHazel commented Nov 3, 2022

kohler commented Nov 3, 2022

zherczeg commented Nov 4, 2022

PhilipHazel commented Nov 4, 2022

kohler commented Nov 4, 2022

zherczeg commented Nov 4, 2022

kohler commented Nov 5, 2022

kohler commented Nov 5, 2022 • edited Loading

zherczeg commented Nov 5, 2022

JetXujing commented Apr 27, 2023

zherczeg commented Apr 27, 2023

JetXujing commented Apr 28, 2023 • edited Loading

NWilson commented Feb 27, 2025

zherczeg commented Feb 27, 2025

NWilson commented Feb 27, 2025

zherczeg commented Feb 27, 2025

NWilson commented Feb 28, 2025

zherczeg commented Mar 1, 2025

NWilson commented Mar 1, 2025 • edited Loading

ltrzesniewski commented Mar 1, 2025

zherczeg commented Mar 2, 2025

NWilson commented Mar 3, 2025

kohler commented Nov 5, 2022 •

edited

Loading

JetXujing commented Apr 28, 2023 •

edited

Loading

NWilson commented Mar 1, 2025 •

edited

Loading