@@ -1547,6 +1547,39 @@ the next two sections), and the terminating closing square bracket. However,
1547
1547
escaping other non-alphanumeric characters does no harm.
1548
1548
.
1549
1549
.
1550
+ .SH "PERL EXTENDED CHARACTER CLASSES"
1551
+ .rs
1552
+ PCRE2 supports Perl's "(?[...])" extended character class syntax. This can
1553
+ be used to perform set operations, such intersection.
1554
+ .P
1555
+ The syntax permitted within "(?[...])" is quite different to ordinary character
1556
+ classes. Inside the extended class, there is an expression syntax consisting of
1557
+ "atoms", operators, and ordinary parentheses "()" used for grouping. The allowed
1558
+ atoms are any escaped characters or sets such as "\e n" or "\e d", POSIX classes
1559
+ such as "[:alpha:]", and any ordinary character class may be nested as an atom
1560
+ within an extended class. For example, in "(?[\e d & [...]])" the nested ordinary
1561
+ class "[...]" follows the ordinary rules for character classes, in which
1562
+ parentheses are not metacharacters, and character literals and ranges are
1563
+ permitted. However, when outside an ordinary character class (such as in "(?[...
1564
+ + (...)])") character literals and ranges may not be used, as they are not atoms
1565
+ in the extended syntax. The extended syntax does not introduce any additional
1566
+ escape sequences, so "(?[\e y])" is an unknown escape, as it would be inside
1567
+ "[\e y]".
1568
+ .P
1569
+ In the extended syntax, ^ does not negate a class (except within an
1570
+ ordinary class nested inside an extended class); it is instead a binary
1571
+ operator.
1572
+ .P
1573
+ The binary operators are "&" (intersection), "|" or "+" (union), "-"
1574
+ (subtraction) and "^" (symmetric difference). These are left-associative and
1575
+ "&" has higher (tighter) precedence, while the others have equal lower
1576
+ precedence. The one prefix unary operator is "!" (complement), with highest
1577
+ precedence.
1578
+ .P
1579
+ A Perl extended character class always has the /xx modifier turned on within
1580
+ it.
1581
+ .
1582
+ .
1550
1583
.SH "UTS#18 EXTENDED CHARACTER CLASSES"
1551
1584
.rs
1552
1585
The PCRE2_ALT_EXTENDED_CLASS option enables an alternative to Perl's "(?[...])"
@@ -1560,18 +1593,19 @@ character becomes an additional metacharacter within classes, denoting the start
1560
1593
of a nested class, so a literal "[" must be escaped as "\e [".
1561
1594
.P
1562
1595
Secondly, within the UTS#18 extended syntax, there are additional operators
1563
- "||", "&&" and "--" which denote character class union, intersection, and
1564
- subtraction respectively. In standard Perl syntax, these would simply be
1565
- needlessly-repeated literals (except for "-" which can denote a range). These
1566
- operators can be used in constructs such as "[\e p{L}--[QW]]" for "Unicode
1567
- letters, other than Q and W". A literal "-" at the end of a range must be
1568
- escaped (so while "[--1]" in Perl syntax is the range from hyphen to "1", it
1569
- must be escaped as "[\e --1]" in UTS#18 extended classes).
1596
+ "||", "&&", "--" and "~~" which denote character class union, intersection,
1597
+ subtraction, and symmetric difference respectively. In standard Perl syntax,
1598
+ these would simply be needlessly-repeated literals (except for "-" which can
1599
+ denote a range). These operators can be used in constructs such as
1600
+ "[\e p{L}--[QW]]" for "Unicode letters, other than Q and W". A literal "-" at
1601
+ the end of a range must be escaped (so while "[--1]" in Perl syntax is the
1602
+ range from hyphen to "1", it must be escaped as "[\e --1]" in UTS#18 extended
1603
+ classes).
1570
1604
.P
1571
1605
The specific rules in PCRE2 are that classes can be nested:
1572
1606
"[...[B]...[^C]...]". The individual class items (literal characters, literal
1573
1607
ranges, properties such as \e d or \e p{...}, and nested classes) can be
1574
- combined by juxtaposition or by an operator "||", "&&", or "--".
1608
+ combined by juxtaposition or by an operator "||", "&&", "--", or "~~ ".
1575
1609
Juxtaposition is the implicit union operator, and binds more tightly than any
1576
1610
explicit operator. Precedence between the explicit operators is not defined,
1577
1611
so mixing operators is a syntax error (thus "[A&&B--C]" is an error, but
0 commit comments