The JSON Schema specifications describes valid regular expression patterns in Section 6.4 by accumulating recommandations:
- they SHOULD be valid for ECMA-262 with unicode enabled
- they MUST NOT be assumed as anchored
- they SHOULD conform to a subset of the regex syntax for portability (schema authors SHOULD limit themselves to the following regular expression tokens), which is:
- individual Unicode characters eg
Caractères accentués…
- simple and character classes
[abc] [a-z]
- complemented simple character classes
[^abc]
- complemented range character classes
[^a-z]
- repetitions
+ * ? {x} {x,y} {x,} and their lazy …? counterparts
- anchors
^ $, grouping (...) and alternation |
Here are some questions:
-
I understand these SHOULD as a validator is not expected to implement full ECMA-262, but only the prescribed subset, which seems reasonable enough if the spec is expected to be portable across languages. Yes?
-
The subset description seems a little fuzzy and incomplete. It could be clarified wrt the following points:
- how are character escaped, or not, in character classes? should it assume POSIX, i.e. no escape with clever positioning? Accept
\ on some characters? On all characters? How to include characters ] - ^ \ in a class?
- can a character class be empty? Can a complemented class be empty? (I would suggest no in both cases).
- what about wildcard
.? I guess it is allowed? if it is allowed, does it reject \n? and \r? or should it assume single-line mode?
- what about predefined character classes:
\s \w \d \D \W \D, are they supported? also inside classes?
- what about advanced character classes:
\p{Letter}, I guess no? If yes, which are expected to be supported?
-
As far as the test suite is concerned, ISTM that the test suite should be limited to cases covered by the subset, which is currently not the case (it uses . and \{Letter}).
The JSON Schema specifications describes valid regular expression patterns in Section 6.4 by accumulating recommandations:
Caractères accentués…[abc] [a-z][^abc][^a-z]+ * ? {x} {x,y} {x,}and their lazy…?counterparts^ $, grouping(...)and alternation|Here are some questions:
I understand these SHOULD as a validator is not expected to implement full ECMA-262, but only the prescribed subset, which seems reasonable enough if the spec is expected to be portable across languages. Yes?
The subset description seems a little fuzzy and incomplete. It could be clarified wrt the following points:
\on some characters? On all characters? How to include characters] - ^ \in a class?.? I guess it is allowed? if it is allowed, does it reject\n? and\r? or should it assume single-line mode?\s \w \d \D \W \D, are they supported? also inside classes?\p{Letter}, I guess no? If yes, which are expected to be supported?As far as the test suite is concerned, ISTM that the test suite should be limited to cases covered by the subset, which is currently not the case (it uses
.and\{Letter}).