-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Character groups are the most common kind of groups in Farkle (the other one, token groups originates from GOLD Parser and can be represented in Farkle's grammars but cannot be created in the builder as of Farkle 6). When inside a character group, the tokenizer tries to match a token with the DFA, and if it did not find anything of interest (that ends the group or starts a nested group), consumes one character from the input before trying again. This is very inefficient because we invoke the DFA once per character.
Farkle 6 tried to improve this by leveraging MemoryExtensions.IndexOf(Any) to quickly jump to the start of a possibly interesting token. While it avoids repeated invocations of the DFA, it is unfortunately incorrect because it assumes that groups end with either a new line, or a literal of the name of their ending symbol. While both GOLD Parser's and Farkle 6's groups only end with these, their grammar files can encode groups that don't. This is an assumption we don't want to carry over to Farkle 7.
Here's what we will do instead. For each group, we will use a special DFA that matches only its end symbol and the start symbol of its nesting groups, if any. If we have say a group that starts with {, ends with } and supports nesting with itself, we would emit a DFA that matches the regex .*[{}] and invoke that DFA, consuming all the characters it read. This approach is definitely faster than the correct naive way.
One question is how to encode these extra DFAs? One way that requires the least changes to the grammar format is to append them to the grammar's main DFA, and introduce a new structure that maps groups to the DFA state to start from when inside them. Token groups will start from the state 0, using the main DFA. For a little more speed gains, we can specify that if a group uses a non-zero initial DFA state, the tokenizer does not have to check if a group nesting is actually valid.
This is too compelling to not land in Farkle 7.0, but needs the builder first.