More robust interpolation parsing #597

mbostock · 2024-01-23T23:33:49Z

This rearchitects our Markdown parser to be more robust. The intent is to fix issues such as:

Support inline expressions for attributes (and in other places) #32
uncaught TypeError: HTML comments with template literals and string interpolation #375
Complex inline expressions aren’t parsed correctly #396
Duplicate cells during incremental update with HEAD + META #379 (maybe)
Parse failure with text before a style element? #252 (maybe)
the blank line quirk when using HTML in Markdown

Rather than implementing inline expression parsing as a markdown-it plugin (which depends on markdown-it’s internal tokenization), we implement a preprocessing step that converts inline expressions ${…} to HTML and extracts the JavaScript source expressions. This HTML is then ignored by markdown-it since HTML is allowed within Markdown. We will continue to parse fenced code blocks using markdown-it, since these are already supported by Markdown, and since these are always generate nodes.

Furthermore, by employing the same HTML parsing state machine as Hypertext Literal, we can be exact about where the expressions start and end, and in what context they need to be evaluated (for example, as an attribute or as an node).

For example, this inline expression:

Hello, ${'world'}!

is compiled to:

Hello, <!-- o:1 -->!

In addition, the 'red' JavaScript expression is extracted so that it can be used to define a runtime variable, which is then displayed on the client, replacing the generated comment. (Note that we could generate <span id="cell-1"></span> instead of the comment here, and we may end up doing that, but I’ll need to change the client to support evaluating dynamic attributes anyway, so I’m opportunistically seeing if this helps remove the wrapper span #11.)

Similarly, this:

<div class=${'red'}>color</div>

is compiled to:

<div o:class="1">color</div>

where o: is a special prefix that denotes a dynamic attribute that will be computed on the client. The exact compiled HTML syntax is still to be determined — the current approach will require using a TreeWalker to find comments and attributes, and then bind them with the runtime variable with the associated identifier.

The last thing to fix here is probably to diff the parsed HTML rather than diffing the parsed Markdown pieces. This would alleviate the requirement that each Markdown “piece” corresponds to exactly one HTML element, which introduces the blank line quirk. Of course, not all of these depend on each other, so I might try to decouple them and approach this more incrementally.

mbostock · 2024-02-03T18:10:57Z

If we also eliminate the wrapper span #11, then this should in theory also be able to support cases like this:

<style>

body {
  background: ${color};
}

</style>

where color is a reactive variable! Pretty amazing.

mkotelnikov · 2024-03-07T08:47:33Z

Hello,

@statewalker/tknz - is a tokenizer (parser) for HTML / MD / syntaxes.

It is very small (30k non minimised, non compressed) with no dependencies
Produces well formed ASTs from documents with exact start/end positions for each token
Opening/closing HTML tags are balanced (by default)
Allows embedding of inline codes
Produces well-formed hierarchy of document sections based on headers

It seems that it covers most issues mentioned here.
You can check it here: https://observablehq.com/@kotelnikov/statewalker-tknz

mbostock · 2024-06-01T05:58:27Z

We now diff HTML, and #1416 removes the wrapper span, so two of the pieces have fallen into place.

Yet one limitation I see with this approach is the assumption that the HTML tokenizer state machine can be applied to Markdown (as-is). I think it can in about 98% of cases because nearly all of Markdown is text context, but here’s at least one example where it fails:

[link](https://example.com/${"path"})

In the above case, ${"path"} would appear to the HTML tokenizer as the text context, but in fact it’s interpolating into the href attribute. There’s also the more pathological case where the same expression in interpolated into both the text and an attribute simultaneously:

<https://example.com/${"path"}>

And I have no idea how auto-linkify should work in this case…

https://example.com/${"path"}

These cases aren’t currently handled in main, either: the ${"path"} isn’t recognized as an inline expression. And it’s pretty easy to rewrite this as HTML if for some reason you wanted this dynamic behavior for some reason. But it suggests that the tokenizer would at least need to recognize Markdown’s link syntax. I probably need to read the CommonMark spec to see if there are other cases.

parseInterpolate

c3034dc

mbostock changed the title ~~parseInterpolate~~ More robust interpolation parsing Jan 23, 2024

This was referenced Jan 29, 2024

inputs search throws error #616

Closed

Backslashes aren’t unescaped within PRE elements #636

Closed

Fil mentioned this pull request Feb 2, 2024

fix client error when the target is missing (e.g. it's in a comment) #378

Closed

mbostock mentioned this pull request Feb 3, 2024

css #670

Draft

This was referenced Feb 17, 2024

Use <code> blocks instead of interpolated strings in Markdown backing files #829

Closed

Some Inline JS not rendering within html within md #900

Closed

mbostock mentioned this pull request Mar 6, 2024

Add an option to allow LaTeX embeddings using $ and $$ notation #981

Open

mbostock mentioned this pull request Mar 11, 2024

centralize md instance as part of the normalized configuration #1034

Merged

This was referenced Jun 3, 2024

no wrapper span #1416

Merged

robust placeholder #1425

Merged

mbostock closed this in #1416 Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More robust interpolation parsing #597

More robust interpolation parsing #597

mbostock commented Jan 23, 2024 •

edited

Loading

mbostock commented Feb 3, 2024

mkotelnikov commented Mar 7, 2024

mbostock commented Jun 1, 2024

More robust interpolation parsing #597

More robust interpolation parsing #597

Conversation

mbostock commented Jan 23, 2024 • edited Loading

mbostock commented Feb 3, 2024

mkotelnikov commented Mar 7, 2024

mbostock commented Jun 1, 2024

mbostock commented Jan 23, 2024 •

edited

Loading