-
Notifications
You must be signed in to change notification settings - Fork 210
Simple pattern blows out JIT stack #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This semantically equivalent regex seems to behave OK:
|
Which release of PCRE? And what is the content of the strings you are matching? (Do they match or not match?) Using the current 10.40 release I do not see a slowdown with a string consisting of "abcde" repeated many times, but I do see the JIT stack overflow when the repeat is more than 272. The JIT maintainer may wish to comment. |
Many releases of PCRE, including HEAD. The content matches. Please see this following commit, to PCRE2’s |
This looks like a * * test, where every character can be matched. The |
I don't think there's anything special about this in the interpreter. (But my memory isn't what it used to be.) |
Regex engines have different engine types: PCRE2 is more like a classic programming language, where you define the operations. For example, if you implement a bubble sort in C, and quicksort in JS, you cannot say C is slow just because JS is faster in this case. |
@zherczeg, is there no special case in the backtracking engine for a pattern like If there is no such special case currently, do you think adding one would be interesting, and do you have any advice about how to do so? |
In particular, call the case I care about
Given this, as soon as the |
These are static compiler optimizations, and there is a project for that: If you implement your suggested optimization there, it can rewrite: The key thing: it does not matter which engine you use, it runs faster, because, as you said, the engine can throw away the backtracking info. The generic use case: you have your patter set, you use repan at compile time to generate their optimized version, and use them at runtime. In this case doing complex regex optimizations is basically free. |
I have simplified the problem reproduction method. I can also reproduce the problem by using the following method.
In addition, I found that when the string length is 1023, no error is reported, but when the string length is 1024, the error is reported. It looks like 1024 is the threshold. |
The default stack is 32K, you should use the jit stack to allocate more. But the pattern is still inefficient. If you make a bubble sort in C, and it is slow, don't blame the compiler. |
Yes, I tried to use the jitstack parameter and it didn't report an error.
I tried the kohler/pcre2@c142ad0 case and added jitstack to the regular expression. No error was reported. Can we consider adding a reminder log: the stack space used by default is not enough, consider using heap space through jitstack. |
Is part of the problem here that the JIT stack is fixed-size? The interpreter puts its backtracking data on the heap, and it can grow very very large before PCRE2 refuses to allocate any more memory. But if I understand correctly: the JIT uses the stack for its backtracking data, and that stack has a size which is fixed up-front before matching begins. If the stack stores offsets rather than actual pointers, then you could grow the stack when its size limit is reaching (by doubling and copying the old one).
The user's problem isn't that it's slow (although that was mentioned). Primarily, the bubble sort should complete rather than give up due to a 32K limit. We don't need to make the JIT fast for regexes that are inherently slow. But it should run to completion at least! |
JIT can grow the stack to a user specified maximum. You can allocate a 1GB address space, and let JIT grow the currently allocated space. |
So why does the user complain about "overflowing the default stack"? Does it not grow by default? |
By default it uses machine stack, and that cannot grow. |
I think I understand now. The missing piece of the puzzle, for me, is that I see that the "machine stack" implementation does not grow but just allocates 32K on the stack. I somehow thought that "uses the machine stack" would be bumping the actual stack pointer, and hence be growable up to several megabytes. Your growable stack implementation works by reserving memory, and allocating pages within that reserved space, so you never have to actually move the stack into a fresh allocation. Hmm. Could there be some best-of-both default, where it starts with a stack which is allocated with a simple malloc (or block on the machine stack), but when that runs out, memcpies the data to a fresh buffer? I sympathise with the user. When you use the interpreter, the out-of-the-box experience is not always fast but the regexes do run to completion. When you use the JIT, the same ought to be true. |
JIT stack contains stack locations as absolute pointers. If you move the stack, you get a crash. Making them relative is possible. However, the stack base is not stored in a register, and allocating another is always complicated on systems with low number of registers. I also don't know its runtime costs. It would be a bigger task to rework the code to use relative addresses, but definitely doable. Finding all locations in the code which stores stack pointer is not trivial. Summary: a time consuming work, which might not worth at the end, so it has risks. |
I see. Various languages and JITs struggle with this. Many garbage collectors don't support compacting (moving) allocations, for example. How about this for a compromise: if using the "machine stack" for the JIT, we could catch the error condition for stack exhaustion, and allocate a growable one automatically (growing up to the limit specified for the interpreter's heap allocations, perhaps), and then simply re-run the matching. This is clearly much less efficient than magically moving the stack to a larger space and continuing matching. We'd also have to deallocate the JIT stack as well, rather than reuse it for subsequent match attempts on the same thread. However, it's an improvement if we assume that users would rather see their regexes return an answer! |
Note that "simply" re-running the matching will be visible when using callouts, and this could cause unwanted side-effects. |
There is a question of what use case you want to solve. For a simple application, which runs 1-2 regex on 1-2 k text, the interpreter is good enough. As for complex applications with regex, the developers use the jit exec api with jit stack, and do the implementation in an afternoon. The point is: they have the control. They can decide which threads use how much (max) memory for regex. There are no hidden costs. They can also free the memory when unneeded. |
OK, I'm convinced! Thank you. |
Hi! Thanks so much for this project.
The following relatively simple pattern has unexpected bad behavior on JIT:
On long strings, non-JIT PCRE performs much faster than JIT, and on an ~8192-character string, PCRE overflows the default JIT stack size.
Failing test available here: kohler@c142ad0
The text was updated successfully, but these errors were encountered: