Skip to content

Latest commit

 

History

History
122 lines (84 loc) · 5.05 KB

grammar.md

File metadata and controls

122 lines (84 loc) · 5.05 KB

Grammar

The Reference grammar is written in markdown code blocks using a modified BNF-like syntax (with a blend of regex and other arbitrary things). The mdbook-spec extension parses these rules and converts them to a renderable format, including railroad diagrams.

The code block should have a lang string with the word "grammar", a comma, and the category of the grammar, like this:

```grammar,items
ProductionName -> SomeExpression
```

The category is used to group similar productions on the grammar summary page in the appendix.

Grammar syntax

The syntax for the grammar itself is pretty close to what is described in the Notation chapter, though there are some rendering differences.

A "root" production, marked with @root, is one that is not used in any other production.

The syntax for the grammar itself (written in itself, hopefully that's not too confusing) is:

Grammar -> Production+

BACKTICK -> U+0060

LF -> U+000A

Production -> `@root`? Name ` ->` Expression

Name -> <Alphanumeric or `_`>+

Expression -> Sequence (` `* `|` ` `* Sequence)*

Sequence -> (` `* AdornedExpr)+

AdornedExpr -> ExprRepeat Suffix? Footnote?

Suffix -> ` _` <not underscore, unless in backtick>* `_`

Footnote -> `[^` ~[`]` LF]+ `]`

ExprRepeat ->
      Expr1 `?`
    | Expr1 `*?`
    | Expr1 `*`
    | Expr1 `+?`
    | Expr1 `+`
    | Expr1 `{` Range? `..` Range? `}`

Range -> [0-9]+

Expr1 ->
      Unicode
    | NonTerminal
    | Break
    | Terminal
    | Charset
    | Prose
    | Group
    | NegativeExpression

Unicode -> `U+` [`A`-`Z` `0`-`9`]4..4

NonTerminal -> Name

Break -> LF ` `+

Terminal -> BACKTICK ~[LF]+ BACKTICK

Charset -> `[` (` `* Characters)+ ` `* `]`

Characters ->
      CharacterRange
    | CharacterTerminal
    | CharacterName

CharacterRange -> BACKTICK <any char> BACKTICK `-` BACKTICK <any char> BACKTICK

CharacterTerminal -> Terminal

CharacterName -> Name

Prose -> `<` ~[`>` LF]+ `>`

Group -> `(` ` `* Expression ` `* `)`

NegativeExpression -> `~` ( Charset | Terminal | NonTerminal )

The general format is a series of productions separated by blank lines. The expressions are:

Expression Example Description
Unicode U+0060 A single unicode character.
NonTerminal FunctionParameters A reference to another production by name.
Break This is used internally by the renderer to detect line breaks and indentation.
Terminal `example` This is a sequence of exact characters, surrounded by backticks
Charset [ `A`-`Z` `0`-`9` `_` ] A choice from a set of characters, space separated. There are three different forms.
CharacterRange [ `A`-`Z` ] A range of characters, each character should be in backticks.
CharacterTerminal [ `x` ] A single character, surrounded by backticks.
CharacterName [ LF ] A nonterminal, referring to another production.
Prose <any ASCII character except CR> This is an English description of what should be matched, surrounded in angle brackets.
Group (`,` Parameter)+ This groups an expression for the purpose of precedence, such as applying a repetition operator to a sequence of other expressions.
NegativeExpression ~[` ` LF] Matches anything except the given Charset, Terminal, or Nonterminal.
Sequence `fn` Name Parameters A sequence of expressions, where they must match in order.
Alternation Expr1 | Expr2 Matches only one of the given expressions, separated by the vertical pipe character.
Suffix _except [LazyBooleanExpression]_ This adds a suffix to the previous expression to provide an additional English description to it, rendered in subscript. This can have limited markdown, but try to avoid anything except basics like links.
Footnote [^extern-safe] This adds a footnote, which can supply some extra information that may be helpful to the user. The footnote itself should be defined outside of the code block like a normal markdown footnote.
Optional Expr? The preceding expression is optional.
Repeat Expr* The preceding expression is repeated 0 or more times.
Repeat (non-greedy) Expr*? The preceding expression is repeated 0 or more times without being greedy.
RepeatPlus Expr+ The preceding expression is repeated 1 or more times.
RepeatPlus (non-greedy) Expr+? The preceding expression is repeated 1 or more times without being greedy.
RepeatRange Expr{2..4} The preceding expression is repeated between the range of times specified. Either bounds can be excluded, which works just like Rust ranges.

Automatic linking

The plugin automatically adds markdown link definitions for all the production names on every page. If you want to link directly to a production name, all you need to do is surround it in square brackets, like [ArrayExpression].

In some cases there might be name collisions with the automatic linking of rule names. In that case, disambiguate with the grammar- prefix, such as [Type][grammar-Type]. You can also do that if you just feel like being more explicit.