Skip to content

Conversation

@sunjay
Copy link
Contributor

@sunjay sunjay commented Feb 9, 2019

Fixes #95.

Hi @kivikakk! I highly recommend reviewing this one commit at a time from my first commit to the latest one. The changes will make much more sense that way. My commit messages detail exactly what I'm trying to do in each step.


My code works and the tests all pass (at least locally), but because of the complexity of the code, I still don't really fully understand some of what I did. I started by copying and pasting the entire close_bracket_match function and then modifying it to work for the new BrokenReference node that I added. I didn't really understand what was going on in that function, so I didn't attempt to fix the duplication this created.

I tried to add comments throughout the codebase to clarify things I did end up understanding, but most of it is still very unclear for me.

To show you it works, here's the AST for the following markdown: (from the reference_links test in tests.rs)

This [is] [legit], [very][honestly] legit and [*very* cool][broken] and [*em*pty][].

[legit]: ok
[honestly]: sure "hm"
Full AST

Node {
    data: RefCell {
        value: Ast {
            value: Document,
            start_line: 0,
            content: b"",
            open: false,
            last_line_blank: false
        }
    },
    children: [
        Node {
            data: RefCell {
                value: Ast {
                    value: Paragraph,
                    start_line: 1,
                    content: b"This [is] [legit], [very][honestly] legit and [*very* cool][broken] and [*em*pty][].\n",
                    open: false,
                    last_line_blank: true
                }
            },
            children: [
                Node {
                    data: RefCell {
                        value: Ast {
                            value: Text(b"This "),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: []
                },
                Node {
                    data: RefCell {
                        value: Ast {
                            value: BrokenReference(
                                NodeBrokenRef {
                                    label: b"is",
                                    is_image: false,
                                    ref_type: Shortcut
                                }
                            ),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: [
                        Node {
                            data: RefCell {
                                value: Ast {
                                    value: Text(b"is"),
                                    start_line: 0,
                                    content: b"",
                                    open: false,
                                    last_line_blank: false
                                }
                            },
                            children: []
                        }
                    ]
                },
                Node {
                    data: RefCell {
                        value: Ast {
                            value: Text(b" "),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: []
                },
                Node {
                    data: RefCell {
                        value: Ast {
                            value: Link(
                                NodeLink {
                                    url: b"ok",
                                    title: b""
                                }
                            ),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: [
                        Node {
                            data: RefCell {
                                value: Ast {
                                    value: Text(b"legit"),
                                    start_line: 0,
                                    content: b"",
                                    open: false,
                                    last_line_blank: false
                                }
                            },
                            children: []
                        }
                    ]
                },
                Node {
                    data: RefCell {
                        value: Ast {
                            value: Text(b", "),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: []
                },
                Node {
                    data: RefCell {
                        value: Ast {
                            value: Link(
                                NodeLink {
                                    url: b"sure",
                                    title: b"hm"
                                }
                            ),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: [
                        Node {
                            data: RefCell {
                                value: Ast {
                                    value: Text(b"very"),
                                    start_line: 0,
                                    content: b"",
                                    open: false,
                                    last_line_blank: false
                                }
                            },
                            children: []
                        }
                    ]
                },
                Node {
                    data: RefCell {
                        value: Ast {
                            value: Text(b" legit and "),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: []
                },
                Node {
                    data: RefCell {
                        value: Ast {
                            value: BrokenReference(
                                NodeBrokenRef {
                                    label: b"broken",
                                    is_image: false,
                                    ref_type: Full
                                }
                            ),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: [
                        Node {
                            data: RefCell {
                                value: Ast {
                                    value: Emph,
                                    start_line: 0,
                                    content: b"",
                                    open: false,
                                    last_line_blank: false
                                }
                            },
                            children: [
                                Node {
                                    data: RefCell {
                                        value: Ast {
                                            value: Text(b"very"),
                                            start_line: 0,
                                            content: b"",
                                            open: false,
                                            last_line_blank: false
                                        }
                                    },
                                    children: []
                                }
                            ]
                        },
                        Node {
                            data: RefCell {
                                value: Ast {
                                    value: Text(b" cool"),
                                    start_line: 0,
                                    content: b"",
                                    open: false,
                                    last_line_blank: false
                                }
                            },
                            children: []
                        }
                    ]
                },
                Node {
                    data: RefCell {
                        value: Ast {
                            value: Text(b" and "),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: []
                },
                Node {
                    data: RefCell {
                        value: Ast {
                            value: BrokenReference(
                                NodeBrokenRef {
                                    label: b"*em*pty",
                                    is_image: false,
                                    ref_type: Collapsed
                                }
                            ),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: [
                        Node {
                            data: RefCell {
                                value: Ast {
                                    value: Emph,
                                    start_line: 0,
                                    content: b"",
                                    open: false,
                                    last_line_blank: false
                                }
                            },
                            children: [
                                Node {
                                    data: RefCell {
                                        value: Ast {
                                            value: Text(b"em"),
                                            start_line: 0,
                                            content: b"",
                                            open: false,
                                            last_line_blank: false
                                        }
                                    },
                                    children: []
                                }
                            ]
                        },
                        Node {
                            data: RefCell {
                                value: Ast {
                                    value: Text(b"pty"),
                                    start_line: 0,
                                    content: b"",
                                    open: false,
                                    last_line_blank: false
                                }
                            },
                            children: []
                        }
                    ]
                },
                Node {
                    data: RefCell {
                        value: Ast {
                            value: Text(b"."),
                            start_line: 0,
                            content: b"",
                            open: false,
                            last_line_blank: false
                        }
                    },
                    children: []
                }
            ]
        }
    ]
}

You'll notice that there are now BrokenReference nodes that contain the references that could not be resolved. For more details, see the documentation in the code for NodeValue::BrokenReference and the documentation for NodeBrokenRef.

These new nodes are transparently rendered as text if they are not replaced by the time they get to the common mark or HTML renderer.


I also took the liberty to bump the minor (0.x) version of the crate since this is a breaking change. Feel free to revert that or do whatever you want with any of this code. I also have a follow up PR to make debugging a lot easier if you are interested. I'll send that once this is merged. 😄

@sunjay sunjay mentioned this pull request Feb 9, 2019
@sunjay
Copy link
Contributor Author

sunjay commented Feb 10, 2019

@kivikakk I seem to have broken some spec tests. I really don't know the code well enough to fix the issue. Could you help me out?

@kivikakk
Copy link
Owner

@sunjay This is looking amazing! Thank you so much for your time and effort here. Let's see:

Example 503 (lines 7903-7907) Links
[link [foo [bar]]](/uri)

--- expected HTML
+++ actual HTML
@@ -1 +1 @@
-<p><a href="/uri">link [foo [bar]]</a></p>
+<p>[link [foo [bar]]](/uri)</p>

This should produce a link to /uri according to the spec, but with these changes it no longer produces any link. The relevant part of the spec reads:

The link text may contain balanced brackets, but not unbalanced ones, unless they are escaped:

Here's what local testing shows different between cmark-gfm and comrak as of this PR:

$ cat >test
[hi](/uri)

[hi [](/uri)

[hi []](/uri)
$ ~/github/cmark-gfm/build/src/cmark-gfm test
<p><a href="/uri">hi</a></p>
<p>[hi <a href="/uri"></a></p>
<p><a href="/uri">hi []</a></p>
$ cargo run -- test
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
     Running `target/debug/comrak test`
<p><a href="/uri">hi</a></p>
<p>[hi <a href="/uri"></a></p>
<p>[hi []](/uri)</p>
$

I've pushed a commit to this branch which adds a comrak test for this, and a commit to fix the issue, which was to remove the following:

        if !is_image {
            let mut i = brackets_len as i32 - 1;
            while i >= 0 {
                if !self.brackets[i as usize].image {
                    if !self.brackets[i as usize].active {
                        break;
                    } else {
                        self.brackets[i as usize].active = false;
                    }
                }
                i -= 1;
            }
        }

The function of this code is to disable brackets surrounding the current processed bracket pair; i.e. so that only the innermost link in the following Markdown is parsed: Here [I think I want [xyz][xyz] this](yes) -- [xyz][xyz] should be linked if there's a matching reference, but that itself should not then be wrapped in another link. (The if !is_image check is because we don't want to disable outer brackets when processing an image, otherwise we wouldn't be able to nest an image within a link.)

Note we do this only when actually constructing a link from brackets, and not in case of failure. See cmark-gfm's behaviour:

$ ~/github/cmark-gfm/build/src/cmark-gfm
Here [I think I want [xyz][xyz] this](maybe)

[xyz]: hi
<p>Here [I think I want <a href="hi">xyz</a> this](maybe)</p>
$ ~/github/cmark-gfm/build/src/cmark-gfm
Here [I think I want [xyz][xyz] this](maybe)
<p>Here <a href="maybe">I think I want [xyz][xyz] this</a></p>
$

The solution for broken references is just to do no such disabling.


I've pushed a commit for this fix; there are a few other compliance issues that speak to deeper issues. For example:

Example 514 (lines 7990-7994) Links
*foo [bar* baz]

--- expected HTML
+++ actual HTML
@@ -1 +1 @@
-<p><em>foo [bar</em> baz]</p>
+<p>*foo [bar* baz]</p>

The non-matching brackets are intended to treated as plain text, such that this example has an emphasis placed. Because we create a node that encompasses all of [bar* baz], the closing asterisk isn't visible to emphasis processing.

I've pushed a comrak test for this, but haven't the time to devise a solution; I hope this guidance might be enough to see you through!

@sunjay
Copy link
Contributor Author

sunjay commented Feb 11, 2019

Thank you for your help! Your explanation is very helpful. 😄

I am not sure if the *foo [bar* baz] issue is actually fixable with this approach. You can't view "foo [bar" as text on its own because BrokenReference nodes are not the same as text nodes. That means that Text("foo ") won't get combined with the start of BrokenReference("[bar...") when looking to potentially parse the emphasis.

If we can't think of something to address this, maybe a broken link callback would be better after all? It seems like that would simplify a lot of the code and avoid the issues we're running into. Maybe this is why pulldown-cmark went with that approach? I will try it out and get back to you.

@sunjay
Copy link
Contributor Author

sunjay commented Feb 12, 2019

Closed in favor of #100. Thanks @kivikakk!

@sunjay sunjay closed this Feb 12, 2019
@sunjay sunjay deleted the broken-refs branch February 12, 2019 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Process broken links

2 participants