Skip to content

Account for curly quotes in syntax definition. #141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PwshPally opened this issue Aug 24, 2018 · 10 comments · Fixed by #173
Closed

Account for curly quotes in syntax definition. #141

PwshPally opened this issue Aug 24, 2018 · 10 comments · Fixed by #173

Comments

@PwshPally
Copy link

Confirmed using different themes.

Environment

  • Editor and Version (VS Code, Atom, Sublime): VS Code
  • Your primary theme: Abyss

Issue Description

When I use a regular expression that contains a ' then the display will treat it as the beginning, or end, of a test string. The color coding in all themes tests shows all the code as a text string until it encounters another '.

Screenshots

image

Expected Behavior

image

Code Samples

Param (
# User account to be modified, this can be any value that Get-ADUser will accept
[Parameter(Mandatory=$true)]
$User,
[Parameter(Mandatory=$true)]
[Int32]$TicketNumber,
# New last name of the person
[ValidatePattern(“^[-'a-z]$”)]
[Parameter(Mandatory=$true)]
$NewLastName,
# User's middle name
[ValidatePattern(“^[-'a-z]
$”)]
$NewMiddleName,
# User's new first name,
[ValidatePattern(“^[-'a-z]*$”)]
$NewFirstName,
# Defines how many characters from the first name will be used. Default is 1.
[Int32]$AlternateAliasFirstNameLength = 1,
# For manually specifing the Display Name
[string]$DisplayName,
[Parameter(DontShow)]
[ValidateSet($True,$False)]
[switch]$Approved = $false
)

@omniomi
Copy link
Contributor

omniomi commented Aug 24, 2018

Hi @PSPally, there's something funky about your quotation marks. They look like U+201C and U+201D which are known as "Left Double Quotation Mark" and "Right Double Quotation Mark." If you use U+0022 which is a normal "Quotation Mark" I am not seeing an issue.

and vs "

Did you copy that code from a document editor like Microsoft Word or Wordpad? You may wish to look at an extension like https://marketplace.visualstudio.com/items?itemName=jinhyuk.replace-curly-quotes if this is a common occurrence.

before-after

Param (
    # User account to be modified, this can be any value that Get-ADUser will accept
    [Parameter(Mandatory = $true)]
    $User,
    [Parameter(Mandatory = $true)]
    [Int32]$TicketNumber,
    # New last name of the person
    [ValidatePattern("^[-'a-z]$")]
    [Parameter(Mandatory = $true)]
    $NewLastName,
    # User's middle name
    [ValidatePattern("^[-'a-z]$")]
    $NewMiddleName,
    # User's new first name,
    [ValidatePattern("^[-'a-z]*$")]
    $NewFirstName,
    # Defines how many characters from the first name will be used. Default is 1.
    [Int32]$AlternateAliasFirstNameLength = 1,
    # For manually specifing the Display Name
    [string]$DisplayName,
    [Parameter(DontShow)]
    [ValidateSet($True, $False)]
    [switch]$Approved = $false
)

@PwshPally
Copy link
Author

Why notepad will never die...

The text version of the code was copied from the PowerShell ISE.
It's the single quote (apostrophe), U+0027 in the regular expressions that is triggering it. I could probably put a + at right side of those lines to minimize how much area it impacts.

@msftrncs
Copy link
Contributor

That's crazy! Powershell accepts those characters as double quotes! PowerShell doesn't even care if they are used in the proper order. EditorSyntax does not accept them at all.

@omniomi
Copy link
Contributor

omniomi commented Aug 25, 2018

@PSPally it's not so much that the ' is triggering it so much as the fact that your curly/smart quotes are not accounted for in the syntax definition. PowerShell does a lot of things to account for characters like curly quotes, emdashes, etc inserted by rich text editors and copied in but I'm not sure that the syntax should account for them as best practice should be to clean them up.

@tylerl0706 should we account for curly quotes, emdashes, etc in the EditorSyntax?

@msftrncs
Copy link
Contributor

My opinion (which doesn't count much) is that if PowerShell's developers took the time to support it, so should PowerShell editors.

image

@DarkLite1
Copy link

Don't think that this is exactly the same issue:
image

@omniomi
Copy link
Contributor

omniomi commented Sep 13, 2018

@DarkLite1, you're correct. Reopening your original issue.

@omniomi omniomi changed the title The ' in a regular expression is being treated as the start (or end) of a string Account for curly quotes in syntax definition. Sep 13, 2018
@omniomi
Copy link
Contributor

omniomi commented Oct 2, 2018

So, PowerShell not only supports curly quotes but it doesn't care which ones you use. That is, you can do this: “this is a string“ with both sets of quotation marks being U+201C "Left Double Quotation Mark." This needs to be accounted for with double-quoted strings (U+0022, U+201C, and U-201D,) single-quoted strings (U+0027, U+2018, and U+2019,) double-quoted here-strings, and single-quoted here-strings.

Regex like this: '(?!') which is currently used needs to become (?:'|\x{2018}|\x{2019})(?!'|\x{2018}|\x{2019}) (this could be condensed to ('|\x{2018}|\x{2019})(?!\1) but I need to make sure the capture group doesn't change the scope assignments.)

This also needs to be accounted for in escaping. Take this: $X = 'cn=William O''Brian' the ' in "O'Brian" is escaped by the preceding '. You can use any combination of quotes there '', ‘', '‘, and so on... To account for this the regex goes from '' to (?:'|\x{2018}|\x{2019}){2}.

It's going to take a bit to make sure I get this right.

@msftrncs
Copy link
Contributor

msftrncs commented Oct 3, 2018

@omniomi , regarding the '(?!') and using \1 in the solution, the \1 will only match exactly what it has captured the first time. This might not be quite what you wanted. Maybe you meant \g<1>, but …

Instead, where I think this regex is used in an END clause, it might be better to use applyEndPatternLast: true so that the patterns for escaped quotes can match first.

And back references in a END refer to captures in BEGIN. I couldn't get a subroutine call in an END to work, but it worked fine in a MATCH (but {2} works better there anyway). Maybe I didn't have some syntax right, but I know backreferences in END refer to the captures in BEGIN.

@msftrncs
Copy link
Contributor

msftrncs commented Oct 4, 2018

@omniomi , may want to use [\"\\x{201C}-\\x{201E}] (double quotes) and [\"\\x{2018}-\\x{201B}] (single quotes) as the basis for the patterns for quotes. They all seem to be supported by PowerShell. However, I have had trouble pasting them (more than 1 at a time) directly to a PowerShell prompt, except the VSCode Integrated terminal seems to accept them being pasted. There is also a quote(double) at U+201F, but PowerShell does not seem to accept it. Not sure why they missed that one. It would be related to the single quote at U+201B.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants