Skip to content

Simple pattern blows out JIT stack #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kohler opened this issue Nov 2, 2022 · 23 comments
Open

Simple pattern blows out JIT stack #156

kohler opened this issue Nov 2, 2022 · 23 comments
Labels
JIT Relating to the JIT feature

Comments

@kohler
Copy link

kohler commented Nov 2, 2022

Hi! Thanks so much for this project.

The following relatively simple pattern has unexpected bad behavior on JIT:

/\A(?:[^\\')]|'(?:[^'\\]|\\[\S\s])*')*/s

On long strings, non-JIT PCRE performs much faster than JIT, and on an ~8192-character string, PCRE overflows the default JIT stack size.

Failing test available here: kohler@c142ad0

@kohler
Copy link
Author

kohler commented Nov 2, 2022

This semantically equivalent regex seems to behave OK:

/\A(?:[^\\')]*(?:'(?:[^'\\]|\\[\S\s])*'|))*/s

@PhilipHazel
Copy link
Collaborator

Which release of PCRE? And what is the content of the strings you are matching? (Do they match or not match?) Using the current 10.40 release I do not see a slowdown with a string consisting of "abcde" repeated many times, but I do see the JIT stack overflow when the repeat is more than 272. The JIT maintainer may wish to comment.

@kohler
Copy link
Author

kohler commented Nov 3, 2022

Many releases of PCRE, including HEAD. The content matches. Please see this following commit, to PCRE2’s testdata/testinput1, for an example string in pcre2test format.

kohler@c142ad0

@zherczeg
Copy link
Collaborator

zherczeg commented Nov 4, 2022

This looks like a * * test, where every character can be matched. The [\S\s] looks like a dotall. This requires a lot of stack data. Atomic blocks should help. Is there any optimization which optimizes this case in the interpreter? I can check this later.

@PhilipHazel
Copy link
Collaborator

I don't think there's anything special about this in the interpreter. (But my memory isn't what it used to be.)

@kohler
Copy link
Author

kohler commented Nov 4, 2022

I was surprised by this behavior mostly because this grammar doesn’t theoretically require any lookbehind or backtracking at all.

image

@zherczeg
Copy link
Collaborator

zherczeg commented Nov 4, 2022

Regex engines have different engine types:
https://zherczeg.github.io/sljit/regex_compare.html

PCRE2 is more like a classic programming language, where you define the operations. For example, if you implement a bubble sort in C, and quicksort in JS, you cannot say C is slow just because JS is faster in this case.

@kohler
Copy link
Author

kohler commented Nov 5, 2022

@zherczeg, is there no special case in the backtracking engine for a pattern like (a<PAT1>|[^a]<PAT2>), where the two branches of the choice are disjoint, and the correct branch can be detected with one character of lookahead?

If there is no such special case currently, do you think adding one would be interesting, and do you have any advice about how to do so?

@kohler
Copy link
Author

kohler commented Nov 5, 2022

In particular, call the case I care about (A|B)*C. Here:

  • The match is greedy.
  • The first character in A cannot start B or C. (NB this includes the case when C is empty at the end of the pattern.)
  • Either A is fixed-length or the last character in A cannot start B or C.
  • The first character in B cannot start A or C.
  • Either B is fixed-length or the last character in B cannot start A or C.

Given this, as soon as the ith repetition of (A|B)* matches, it is safe to throw away the backtracking information associated with the 1st-ith repetitions of (A|B)*. The final match will either fail or start with ≥i repetitions of (A|B).

@zherczeg
Copy link
Collaborator

zherczeg commented Nov 5, 2022

These are static compiler optimizations, and there is a project for that:
https://github.com/zherczeg/repan

If you implement your suggested optimization there, it can rewrite:
/\A(?:[^\\')]|'(?:[^'\\]|\\[\S\s])*')*/s -> /\A(?:[^\\')]|'(?:[^'\\]|\\[\S\s])*+')*+/s

The key thing: it does not matter which engine you use, it runs faster, because, as you said, the engine can throw away the backtracking info. The generic use case: you have your patter set, you use repan at compile time to generate their optimized version, and use them at runtime. In this case doing complex regex optimizations is basically free.

@JetXujing
Copy link
Contributor

I have simplified the problem reproduction method. I can also reproduce the problem by using the following method.

[root@localhost ~]# cat re
/([^A]|B)*/
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[root@localhost ~]# pcre2test -jit re
PCRE2 version 10.39 2021-10-29
/([^A]|B)*/
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Failed: error -46: JIT stack limit reached

In addition, I found that when the string length is 1023, no error is reported, but when the string length is 1024, the error is reported. It looks like 1024 is the threshold.

@zherczeg
Copy link
Collaborator

The default stack is 32K, you should use the jit stack to allocate more. But the pattern is still inefficient. If you make a bubble sort in C, and it is slow, don't blame the compiler.

@JetXujing
Copy link
Contributor

JetXujing commented Apr 28, 2023

Yes, I tried to use the jitstack parameter and it didn't report an error.

[root@localhost ~]# pcre2test -jit re
PCRE2 version 10.39 2021-10-29
/([^A]|B)*/jitstack=1024
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPXX
 0: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPXX
 1: X

I tried the kohler/pcre2@c142ad0 case and added jitstack to the regular expression. No error was reported.
/\A(?:[^\\')]|'(?:[^'\\]|\\[\S\s])*')*/s,subject_literal,jitstack=8192

Can we consider adding a reminder log: the stack space used by default is not enough, consider using heap space through jitstack.

@NWilson
Copy link
Member

NWilson commented Feb 27, 2025

Is part of the problem here that the JIT stack is fixed-size? The interpreter puts its backtracking data on the heap, and it can grow very very large before PCRE2 refuses to allocate any more memory.

But if I understand correctly: the JIT uses the stack for its backtracking data, and that stack has a size which is fixed up-front before matching begins. If the stack stores offsets rather than actual pointers, then you could grow the stack when its size limit is reaching (by doubling and copying the old one).

The default stack is 32K, you should use the jit stack to allocate more. But the pattern is still inefficient. If you make a bubble sort in C, and it is slow, don't blame the compiler.

The user's problem isn't that it's slow (although that was mentioned). Primarily, the bubble sort should complete rather than give up due to a 32K limit. We don't need to make the JIT fast for regexes that are inherently slow. But it should run to completion at least!

@zherczeg
Copy link
Collaborator

JIT can grow the stack to a user specified maximum. You can allocate a 1GB address space, and let JIT grow the currently allocated space.

@NWilson
Copy link
Member

NWilson commented Feb 27, 2025

So why does the user complain about "overflowing the default stack"? Does it not grow by default?

@zherczeg
Copy link
Collaborator

By default it uses machine stack, and that cannot grow.

@NWilson
Copy link
Member

NWilson commented Feb 28, 2025

I think I understand now. The missing piece of the puzzle, for me, is that I see that the "machine stack" implementation does not grow but just allocates 32K on the stack. I somehow thought that "uses the machine stack" would be bumping the actual stack pointer, and hence be growable up to several megabytes.

Your growable stack implementation works by reserving memory, and allocating pages within that reserved space, so you never have to actually move the stack into a fresh allocation.

Hmm.

Could there be some best-of-both default, where it starts with a stack which is allocated with a simple malloc (or block on the machine stack), but when that runs out, memcpies the data to a fresh buffer?

I sympathise with the user. When you use the interpreter, the out-of-the-box experience is not always fast but the regexes do run to completion. When you use the JIT, the same ought to be true.

@zherczeg
Copy link
Collaborator

zherczeg commented Mar 1, 2025

JIT stack contains stack locations as absolute pointers. If you move the stack, you get a crash. Making them relative is possible. However, the stack base is not stored in a register, and allocating another is always complicated on systems with low number of registers. I also don't know its runtime costs.

It would be a bigger task to rework the code to use relative addresses, but definitely doable. Finding all locations in the code which stores stack pointer is not trivial.

Summary: a time consuming work, which might not worth at the end, so it has risks.

@NWilson
Copy link
Member

NWilson commented Mar 1, 2025

I see. Various languages and JITs struggle with this. Many garbage collectors don't support compacting (moving) allocations, for example.

How about this for a compromise: if using the "machine stack" for the JIT, we could catch the error condition for stack exhaustion, and allocate a growable one automatically (growing up to the limit specified for the interpreter's heap allocations, perhaps), and then simply re-run the matching.

This is clearly much less efficient than magically moving the stack to a larger space and continuing matching. We'd also have to deallocate the JIT stack as well, rather than reuse it for subsequent match attempts on the same thread. However, it's an improvement if we assume that users would rather see their regexes return an answer!

@ltrzesniewski
Copy link
Contributor

Note that "simply" re-running the matching will be visible when using callouts, and this could cause unwanted side-effects.

@zherczeg
Copy link
Collaborator

zherczeg commented Mar 2, 2025

There is a question of what use case you want to solve. For a simple application, which runs 1-2 regex on 1-2 k text, the interpreter is good enough. As for complex applications with regex, the developers use the jit exec api with jit stack, and do the implementation in an afternoon. The point is: they have the control. They can decide which threads use how much (max) memory for regex. There are no hidden costs. They can also free the memory when unneeded.

@NWilson
Copy link
Member

NWilson commented Mar 3, 2025

OK, I'm convinced! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JIT Relating to the JIT feature
Projects
None yet
Development

No branches or pull requests

6 participants