Skip to content

Regular expression pattern specification and testing clarifications #898

@zx80

Description

@zx80

The JSON Schema specifications describes valid regular expression patterns in Section 6.4 by accumulating recommandations:

  • they SHOULD be valid for ECMA-262 with unicode enabled
  • they MUST NOT be assumed as anchored
  • they SHOULD conform to a subset of the regex syntax for portability (schema authors SHOULD limit themselves to the following regular expression tokens), which is:
    • individual Unicode characters eg Caractères accentués…
    • simple and character classes [abc] [a-z]
    • complemented simple character classes [^abc]
    • complemented range character classes [^a-z]
    • repetitions + * ? {x} {x,y} {x,} and their lazy …? counterparts
    • anchors ^ $, grouping (...) and alternation |

Here are some questions:

  1. I understand these SHOULD as a validator is not expected to implement full ECMA-262, but only the prescribed subset, which seems reasonable enough if the spec is expected to be portable across languages. Yes?

  2. The subset description seems a little fuzzy and incomplete. It could be clarified wrt the following points:

    • how are character escaped, or not, in character classes? should it assume POSIX, i.e. no escape with clever positioning? Accept \ on some characters? On all characters? How to include characters ] - ^ \ in a class?
    • can a character class be empty? Can a complemented class be empty? (I would suggest no in both cases).
    • what about wildcard .? I guess it is allowed? if it is allowed, does it reject \n? and \r? or should it assume single-line mode?
    • what about predefined character classes: \s \w \d \D \W \D, are they supported? also inside classes?
    • what about advanced character classes: \p{Letter}, I guess no? If yes, which are expected to be supported?
  3. As far as the test suite is concerned, ISTM that the test suite should be limited to cases covered by the subset, which is currently not the case (it uses . and \{Letter}).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions