Skip to content
This repository was archived by the owner on Dec 15, 2022. It is now read-only.

Forbid comments with more than two dashes #87

Merged
merged 8 commits into from
Jul 1, 2018
Merged

Forbid comments with more than two dashes #87

merged 8 commits into from
Jul 1, 2018

Conversation

steventango
Copy link
Contributor

Description of the Change

Any comment ending with more than 2 dashes is now considered as invalid.illegal.bad-comments-or-CDATA.xml

Alternate Designs

Benefits

Grammar no longer accepts ---> comment ending that is forbidden by XML grammar.

Possible Drawbacks

Applicable Issues

#84

@steventango steventango changed the title 🐛 Forbid comments from ending in triple dash 🐛 Forbid comments with more than two dashes Jun 27, 2018
Copy link
Contributor

@winstliu winstliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @TangSteven! Just one question :).

Additionally, would you mind adding a new spec for this to ensure that this behavior doesn't regress? Thanks again 🙇

@@ -403,3 +403,9 @@
'name': 'punctuation.definition.comment.xml'
'end': '--%?>'
'name': 'comment.block.xml'
'patterns': [
{
'match': '-{3,}>'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is {3,} and not {3}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if 3 dashes is invalid grammar, anything longer than 2 should also be invalid.

So if the end arrow was four long (---->)
{3} would only highlight - --->
{3,} highlights the entire arrow which makes more sense, ---->

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I feel like we should only be highlighting the last three.
@Ingramz any opinions on this?

@steventango
Copy link
Contributor Author

I'm not quite sure how to add a new spec, would you mind pointing me in the right direction?

@winstliu
Copy link
Contributor

I'm not quite sure how to add a new spec, would you mind pointing me in the right direction?

Yeah, of course! Specs are added to https://github.com/atom/language-xml/blob/master/spec/xml-spec.coffee. It looks like there aren't a lot of specs yet for this language, but you can take a look at language-html's for inspiration.

@Ingramz
Copy link

Ingramz commented Jun 28, 2018

The specification mentions that the grammar for comments is following:

Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

Simplified:

Comment ::= '<!--' (CharNoDash | ('-' CharNoDash))* '-->'

Which can be interpreted as -- is illegal if it is not followed by >.

This pull request in its current state only covers the example provided in specification, but not the general case, which leaves cases such as the following valid:

<!-- Hello -- World -->

Do note that the following is still valid:

<!--- Hello World -->

I think it would be easier to just make a rule within the body of comment that just searched for --(?!>) and highlighted that part as the source of error. This way it doesn't really matter if it is 3, 4 or 10 hyphens and whether they are part of the comment body or the ending tag. Should be cleaner future maintenance wise too.

Edit: Also the comments for JSP (<%-- --%>) and XML (<!-- -->) should be broken apart as I think the double hyphen is only disallowed in XML comments and not in JSP.

@winstliu winstliu changed the title 🐛 Forbid comments with more than two dashes Forbid comments with more than two dashes Jun 28, 2018
@steventango
Copy link
Contributor Author

I added a negative lookbehind (?<!<[!%]), as without it incorrectly matches the <!-- of the next comment. Example:

capture

With the negative lookbehind:
capture2

How would I break apart the JSP and XML comments?

@Ingramz
Copy link

Ingramz commented Jun 29, 2018

Just create two separate rules like this:

  'comments':
    'patterns': [
      {
        'begin': '<%--'
        'captures':
          '0':
            'name': 'punctuation.definition.comment.xml'
        'end': '--%>'
        'name': 'comment.block.xml'
      }
      {
        'begin': '<!--'
        'captures':
          '0':
            'name': 'punctuation.definition.comment.xml'
        'end': '-->'
        'name': 'comment.block.xml'
      }
    ]

Then work only on the second one.

The lookbehind should be unnecessary. From the point where the error is first registered, the rest of the document will be invalid. To make it a little clearer where the error comes from, we can highlight the first occurrence only by making use of begin/end rule:

        'patterns': [
          {
            'begin': '--(?!>)'
            'beginCaptures':
              '0':
                'name': 'invalid.illegal.bad-comments-or-CDATA.xml'
          }
        ]

@steventango
Copy link
Contributor Author

Updated!

@winstliu winstliu merged commit 7b54428 into atom:master Jul 1, 2018
vfcp added a commit to vfcp/language-xml that referenced this pull request Oct 26, 2020
atom#96 already provides all the details about the issues fixed here.

atom#87 (comment) has the correct code but merge included some extra indent which causes the rule not to work properly.

In relation with atom#91, a `begin` without `end` or `while` was added but this is not valid as `begin` should always have a corresponding `end` or `while`.  `match` should be used instead of `begin`
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants