Skip to content

Conversation

@brson
Copy link
Collaborator

@brson brson commented Mar 27, 2018

Hi @kivikakk.

Here's a fix for a panic in the parser. This is the patch I committed to Reddit's downstream fork, but I'm only just learning the codebase, and would love to have your feedback on the correct fix. The patch contains a massive comment indicative of my low understanding of the code in general, so I understand if you want that slimmed down.

Problem description from the commit follows.

Fix a corner case in the ATX header parser

The parser would previously crash in some cases of empty ATX headers with
trailing hashes. This case isn't captured by the CommonMark spec.

The bad cases look like this:

 ###    ####

(note the leading space here is not part of the bug - it's just so git commit
doesn't throw these lines away as comments)

In these cases the line parser advances first_nonspace all the way to the
second set of hashes, while the chop_trailing_spaces routine for stripping
the trailing hashes also trims leading whitespace, leading to a buffer
overrun as add_text_to_container thinks the content of the line
is only the leading "###" but that the text begins at the start
of the start of "####".

This hasn't been detected previously because the spec only contains
a test case for

 ### ###

and standard CommonMark always eats exactly one space after the initial ATX
hashes, leaving the line parser's offset equal to first_nonspace, and
correctly adding empty text to the header container.

The spec might want an additional test case containing multiple spaces in an
empty bookended ATX header.

cc @SSJohns

The parser would previously crash in some cases of empty ATX headers with
trailing hashes. This case isn't captured by the CommonMark spec.

The bad cases look like this:

```
 ###    ####
```

(note the leading space here is not part of the bug - it's just so `git commit`
doesn't throw these lines away as comments)

In these cases the line parser advances `first_nonspace` all the way to the
second set of hashes, while the `chop_trailing_spaces` routine for stripping
the trailing hashes also trims leading whitespace, leading to a buffer
overrun as `add_text_to_container` thinks the content of the line
is only the leading "###" but that the text begins at the start
of the start of "####".

This hasn't been detected previously because the spec only contains
a test case for

```
 ### ###
```

and standard CommonMark always eats exactly one space after the initial ATX
hashes, leaving the line parser's `offset` equal to `first_nonspace`, and
correctly adding empty text to the header container.

The spec might want an additional test case containing multiple spaces in an
empty bookended ATX header.
@kivikakk
Copy link
Owner

This looks good to me; I think because we're actually shortening the string buffer with truncate we're hitting this panic; cmark just reduces the len of the buffer, but the underlying allocated memory remains and so it doesn't hit this.

Thank you!

@kivikakk kivikakk merged commit a8d94e6 into kivikakk:master Mar 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants