Skip to content

Commit 65c89ff

Browse files
committed
pcre2test: avoid printing invalid utf trail in partial match
When match_invalid_utf is enabled, invalid UTF-8 data can't match but it was mistakenly getting printed as part of a partial match eventhough the ovector correctly didn't include it, as shown by: PCRE2 version 10.34 2019-11-21 re> /(?<=..)X/match_invalid_utf,allvector data> XX\x80\=ph,ovector=1 Partial match: \x{80} ** ovector[1] is not equal to the subject length: 2 != 3 0: 2 2 Fix the logic to print instead the empty match that was returned and as a side effect avoid a buffer overread when trying to decode UTF-8 that was missing code units. Fixes: PCRE2Project#235
1 parent 9323329 commit 65c89ff

File tree

4 files changed

+17
-1
lines changed

4 files changed

+17
-1
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,10 @@ testtemp2
6666
testtemp2grep
6767
testtry
6868
testtrygrep
69+
testSinput
70+
testbtables
71+
testsaved1
72+
testsaved2
6973

7074
m4/libtool.m4
7175
m4/ltoptions.m4

src/pcre2test.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8064,7 +8064,7 @@ for (gmatched = 0;; gmatched++)
80648064
rubriclength += 15;
80658065

80668066
PCHARS(backlength, pp, leftchar, ovector[0] - leftchar, utf, outfile);
8067-
PCHARSV(pp, ovector[0], ulen - ovector[0], utf, outfile);
8067+
PCHARSV(pp, ovector[0], ovector[1] - ovector[0], utf, outfile);
80688068

80698069
if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used)
80708070
fprintf(outfile, " (JIT)");

testdata/testinput10

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -506,6 +506,10 @@
506506
\= Expect no match
507507
ab\x80cdef\=ph
508508

509+
/(?<=..)X/match_invalid_utf
510+
XX\x80\=ph
511+
XX\xef\=ph
512+
509513
/ab$/match_invalid_utf
510514
ab\x80cdeab
511515
\= Expect no match

testdata/testoutput10

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1646,6 +1646,14 @@ Partial match: ab
16461646
ab\x80cdef\=ph
16471647
No match
16481648

1649+
/(?<=..)X/match_invalid_utf
1650+
XX\x80\=ph
1651+
Partial match:
1652+
** ovector[1] is not equal to the subject length: 2 != 3
1653+
XX\xef\=ph
1654+
Partial match:
1655+
** ovector[1] is not equal to the subject length: 2 != 3
1656+
16491657
/ab$/match_invalid_utf
16501658
ab\x80cdeab
16511659
0: ab

0 commit comments

Comments
 (0)