Regular expression pattern specification and testing clarifications

The JSON Schema specifications describes valid regular expression patterns in [Section 6.4](https://json-schema.org/draft/2020-12/json-schema-core#name-regular-expressions) by accumulating recommandations:

- they SHOULD be valid for ECMA-262 with unicode enabled
- they MUST NOT be assumed as anchored 
- they SHOULD conform to a subset of the regex syntax for portability (_schema authors SHOULD limit themselves to the following regular expression tokens_), which is:
  - individual Unicode characters eg `Caractères accentués…`
  - simple and character classes `[abc] [a-z]`
  - complemented simple character classes `[^abc]`
  - complemented range character classes `[^a-z]`
  - repetitions `+ * ? {x} {x,y} {x,}` and their lazy `…?` counterparts
  - anchors `^ $`, grouping `(...)` and alternation `|`

Here are some questions:

1. I understand these SHOULD as a validator is **not** expected to implement full ECMA-262, but _only_ the prescribed subset, which seems reasonable enough if the spec is expected to be portable across languages. Yes?

2. The subset description seems a little fuzzy and incomplete. It could be clarified wrt the following points:

   - how are character escaped, or not, in character classes? should it assume POSIX, i.e. no escape with clever positioning? Accept `\` on some characters? On all characters? How to include characters `] - ^ \` in a class?
   - can a character class be empty? Can a complemented class be empty? (I would suggest no in both cases).
   - what about wildcard `.`? I guess it is allowed? if it is allowed, does it reject `\n`? and `\r`? or should it assume single-line mode?
   - what about predefined character classes: `\s \w \d \D \W \D`, are they supported? also inside classes?
   - what about advanced character classes: `\p{Letter}`, I guess no? If yes, which are expected to be supported?

3. As far as the test suite is concerned, ISTM that the test suite should be limited to cases covered by the subset, which is currently not the case (it uses `.` and `\{Letter}`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regular expression pattern specification and testing clarifications #898

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Regular expression pattern specification and testing clarifications #898

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions