Fix a corner case in the ATX header parser #53

brson · 2018-03-27T17:18:56Z

Here's a fix for a panic in the parser. This is the patch I committed to Reddit's downstream fork, but I'm only just learning the codebase, and would love to have your feedback on the correct fix. The patch contains a massive comment indicative of my low understanding of the code in general, so I understand if you want that slimmed down.

Problem description from the commit follows.

Fix a corner case in the ATX header parser

The parser would previously crash in some cases of empty ATX headers with
trailing hashes. This case isn't captured by the CommonMark spec.

The bad cases look like this:

 ###    ####

(note the leading space here is not part of the bug - it's just so git commit
doesn't throw these lines away as comments)

In these cases the line parser advances first_nonspace all the way to the
second set of hashes, while the chop_trailing_spaces routine for stripping
the trailing hashes also trims leading whitespace, leading to a buffer
overrun as add_text_to_container thinks the content of the line
is only the leading "###" but that the text begins at the start
of the start of "####".

This hasn't been detected previously because the spec only contains
a test case for

 ### ###

and standard CommonMark always eats exactly one space after the initial ATX
hashes, leaving the line parser's offset equal to first_nonspace, and
correctly adding empty text to the header container.

The spec might want an additional test case containing multiple spaces in an
empty bookended ATX header.

cc @SSJohns

The parser would previously crash in some cases of empty ATX headers with trailing hashes. This case isn't captured by the CommonMark spec. The bad cases look like this: ``` ### #### ``` (note the leading space here is not part of the bug - it's just so `git commit` doesn't throw these lines away as comments) In these cases the line parser advances `first_nonspace` all the way to the second set of hashes, while the `chop_trailing_spaces` routine for stripping the trailing hashes also trims leading whitespace, leading to a buffer overrun as `add_text_to_container` thinks the content of the line is only the leading "###" but that the text begins at the start of the start of "####". This hasn't been detected previously because the spec only contains a test case for ``` ### ### ``` and standard CommonMark always eats exactly one space after the initial ATX hashes, leaving the line parser's `offset` equal to `first_nonspace`, and correctly adding empty text to the header container. The spec might want an additional test case containing multiple spaces in an empty bookended ATX header.

kivikakk · 2018-03-29T02:36:59Z

This looks good to me; I think because we're actually shortening the string buffer with truncate we're hitting this panic; cmark just reduces the len of the buffer, but the underlying allocated memory remains and so it doesn't hit this.

Thank you!

kivikakk merged commit a8d94e6 into kivikakk:master Mar 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix a corner case in the ATX header parser #53

Fix a corner case in the ATX header parser #53

Uh oh!

brson commented Mar 27, 2018

Uh oh!

kivikakk commented Mar 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix a corner case in the ATX header parser #53

Fix a corner case in the ATX header parser #53

Uh oh!

Conversation

brson commented Mar 27, 2018

Uh oh!

kivikakk commented Mar 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants