Skip to content

Update AnalyzerConfig.cs s_propertyMatcherPattern regex key#82585

Open
Daynvheur wants to merge 2 commits into
dotnet:mainfrom
Daynvheur:main
Open

Update AnalyzerConfig.cs s_propertyMatcherPattern regex key#82585
Daynvheur wants to merge 2 commits into
dotnet:mainfrom
Daynvheur:main

Conversation

@Daynvheur

@Daynvheur Daynvheur commented Mar 2, 2026

Copy link
Copy Markdown

Closes #55431
Allows space characters in the key name as per specification.

Regex testings: https://regex101.com/r/l0FeUH/9 (Changed \s to [^\S\r\n] there only to account for multiline context, see notes)

PR related to issue #55431 (comment)
Closes PR !78459

Notes:

  • To NOT break current implementation parsing, inline comments are still available. Removing the trailing ([#;].*)? part would solve the issue.
  • It's a single-line evaluation process implementation, I changed \s to [^\S\r\n] only in the multiline testing context
  • [\w\.\-_] ["a word character, or a dot, or a dash, or an underscore"] is much slower than (the previously proposed) [^=:\s] ["any non-whitespace character or equal or double dot"], and can't be parsed on multiline online tools with the evaluation list linked above, I thus kept the regex testings link as it was initially

@Daynvheur Daynvheur requested review from a team as code owners March 2, 2026 17:27
@dotnet-policy-service dotnet-policy-service Bot added Community The pull request was submitted by a contributor who is not a Microsoft employee. VSCode labels Mar 2, 2026
@Daynvheur

Copy link
Copy Markdown
Author

@dotnet-policy-service agree

@Daynvheur

Copy link
Copy Markdown
Author

This a retry for the PR #78459, with a much simpler regex which only allows the space-related characters where due.

Comment thread src/Compilers/Core/Portable/CommandLine/AnalyzerConfig.cs Outdated
Comment thread src/Compilers/Core/CodeAnalysisTest/Analyzers/AnalyzerConfigTests.cs Outdated
Comment thread src/Compilers/Core/CodeAnalysisTest/Analyzers/AnalyzerConfigTests.cs Outdated
@CyrusNajmabadi

Copy link
Copy Markdown
Contributor

I'm somewhat confused/worried about this PR. It says that its purpose is to close out #55431. However, #55431 is about having spaces in category names. And i don't see a single test that actually seems to be validating this behavior, or trying to validate the exact cases reported in that bug.

This makes me feel like this hasn't been deeply looked at our thought through.

@Daynvheur

Daynvheur commented Mar 3, 2026

Copy link
Copy Markdown
Author

I'm somewhat confused/worried about this PR. It says that its purpose is to close out #55431. However, #55431 is about having spaces in category names.

I understand your confusion, due to two distinct flavours of "categories":

  • What .editorconfig for diagnostic category with spaces #55431 treats as a category is a key-level category (dotnet_analyzer_diagnostic.category-Minor Code Smell.severity = suggestion, dotnet_analyzer_diagnostic.category-Major Code Smell.severity = suggestion)
  • What AnalyzerConfig.cs considers as a category (bracket-surrounded .editorconfig lines), denoted as "Globs"

@CyrusNajmabadi: The goal here is to enable space-including key names as different hint level (sub-)categories, inside the language level category. The "category" from within the .editorconfig globs parsing was not involved, and is supposed to correctly handle spaces when relevant.

Addendum: #82585 (comment) the test case for spaced key was there already, I only allowed for it in the unit test assertions.

Comment thread src/Compilers/Core/CodeAnalysisTest/Analyzers/AnalyzerConfigTests.cs Outdated
Comment thread src/Compilers/Core/CodeAnalysisTest/Analyzers/AnalyzerConfigTests.cs Outdated
Comment thread src/Compilers/Core/Portable/CommandLine/AnalyzerConfig.cs Outdated
@CyrusNajmabadi

Copy link
Copy Markdown
Contributor

I am not comfortable with this change. It seems like it is easy to end up in situations with hangs, without people understanding why that happens. Being able to prove that the changes will not lead to other cases of that adds a ton of risk here for very low gain.

@Daynvheur

Copy link
Copy Markdown
Author

very low gain.

That is your pov.
I can read that, but I have to bring upfront that as a user, I'm having a very bad moment where I am professionally required to use SonarLint as a tool that does comply with specifications but not implementation.
Sonar team have zero motivation to comply with non-specified roslyn implemented behaviour (despite it never working beforehand, and I am very ungratefull to them for that as the root cause).
Roslyn team have zero motivation to comply with their own specifications despite everything I'm trying to corner and ease and please.

The root problem breaks down like this:

  • Specification is:

For each line:

  • Remove all leading and trailing whitespace.
  • Process the remaining text as specified for its type below.

The types of lines are:

  • Blank: Contains nothing. Blank lines are ignored.
  • Comment: starts with a ; or a #. Comment lines are ignored.
  • Section Header: starts with a [ and ends with a ]. These lines define globs; see Glob Expressions.
    May contain any characters between the square brackets (e.g., [ and ] and even spaces and tabs are allowed).
    Forward slashes (/) are used as path separators.
    Backslashes (\) are not allowed as path separators (even on Windows).
  • Key-Value Pair (or Pair): contains a key and a value, separated by an =. See Supported Pairs.

Key: The part before the first = on the line.
Value: The part, if any, after the first = on the line.
Keys and values are trimmed of leading and trailing whitespace, but include any whitespace that is between non-whitespace characters.
If a value is not provided, then the value is an empty string (i.e., "" in C or Python).

Any line that is not one of the above is invalid.;

Key: The part before the first = on the line. is exactly [^:=]+, excluding patterns that form globs.

  • Implementation is:
    // Matches EditorConfig section header such as "[*.{js,py}]", see https://editorconfig.org for details
   private const string s_sectionMatcherPattern = @"^\s*\[(([^#;]|\\#|\\;)+)\]\s*([#;].*)?$";

   // Matches EditorConfig property such as "indent_style = space", see https://editorconfig.org for details
   private const string s_propertyMatcherPattern = @"^\s*([\w\.\-_]+)\s*[=:]\s*(.*?)\s*([#;].*)?$";

([\w\.\-_]+)\s*[=:] limits the character range to word-type characters, dots, dashes and underscores explicitly, which is not the part before the first = on the line.

  • Smoothed out implementation is
    private const string s_propertyMatcherPattern = """
       #trim leading spaces
       ^\s*
       #match key (which have to contain letters, digits, underscores, dots, dashes, and optional spaces) as a non-capturing multiple-part pattern, excluding trailing spaces of the entire group
       ((?:[\w\.\-_]+\s*?)+)\s*
       #separation between key and value, trimming leading spaces
       [=:]\s*
       #match value as optionally anything (excluding trailing spaces) until a hash or semicolon
       (.*?)\s*
       #match comment as optionally anything after a hash or a semicolon (which should not be allowed per version 0.15.0 of the specifications, see https://spec.editorconfig.org/#no-inline-comments)
       ([#;].*)?$
       """;

Which is an overly verbose and unusual way to implement a regex.
((?:[\w\.\-_]+\s*?)+)\s* is the same as implementation, just allowing for space-separated multiparts. The hangs in the parser could be worked around either by allowing more characters (I only tested for \, but more might have the same drawback) with ((?:[\w\.\-_\\]+\s*?)+)\s*, or disallowing more characters with ((?:[\w\.\-_&&[^\\]]+\s*?)+)\s*. Both are fine to me, the explicit allowance would go more towards specification.

I am glad you are taking the time needed to review the PR and answer back any time either you or I need more data. I may insist the current situation is bad on my end, and I'm doing everything I can to get it polished enough it may end up having a fair and honest complete review down the line.

@CyrusNajmabadi

Copy link
Copy Markdown
Contributor

I appreciate the work you’ve put into this. I completely understand the motivation behind these changes, and I know how frustrating it can be to work around the current inefficiencies. However, we're in a difficult position regarding the risk this introduces to the compiler's stability.

The primary concern is that the PR explicitly mentions the potential for indefinite hangs, but those hangs aren't clearly identifiable or predictable just by reviewing the logic. Accepting a change where the failure mode is both high-severity and structurally opaque creates an unacceptable risk for our users. Since we don't yet have a definitive root cause or a way to prove that other scenarios won't be impacted, we can't merge this in its current state.

We definitely want to find a path forward here, but we have to prioritize stability and ensure we aren't introducing regressions that are nearly impossible to diagnose in the wild. To move this forward, we really need to get to the bottom of why these patterns are triggering hangs and build in the necessary safety guardrails. Once the underlying issue is understood and mitigated, I’d be happy to take another look.

@Daynvheur

Copy link
Copy Markdown
Author

I simplified the regex once more, since the multiparting was superfluous: allowing for \s inside the key part was just ([\w\.\-_\s]+?)\s* all along.

@CyrusNajmabadi: is it worth it to keep the inline regex comment, only for those two characters added?
I made a two-stage commit:

Both are identical, and the more recent one better blends in main's state, but I may revert to the commented version if it is more desirable.

@CyrusNajmabadi

Copy link
Copy Markdown
Contributor

is it worth it to keep the inline regex comment, only for those two characters added?

I'm ok with not having the inline-regex comment in this case.

@Daynvheur

Copy link
Copy Markdown
Author

I'm forcing a branch swipe with the latest commit changing 3 characters:

  • added \s as allowed character in the key part
  • changed + as +? to non-greedily accept characters that are key parts vs outside matches

@Daynvheur

Copy link
Copy Markdown
Author

@CyrusNajmabadi: modification scope is now reduced to 3 characters, and there is no failed tests left.
If you see more to be added, or some other part needs work, please let me know.

Comment thread src/Compilers/Core/Portable/CommandLine/AnalyzerConfig.cs
@Daynvheur

Daynvheur commented Mar 6, 2026

Copy link
Copy Markdown
Author

I (forced-)retried the last commit to hope CI will have a better outcome.

Edit: It worked.

@Daynvheur

This comment was marked as off-topic.

@Daynvheur

This comment was marked as duplicate.

@Daynvheur

Copy link
Copy Markdown
Author

I marked my previous comment about the regex as Off-Topic, since it may distract from the core change here.

Is there something I can do to keep this issue moving forward?

EXT-Gweltaz added 2 commits May 19, 2026 14:18
Closes dotnet#55431
Allows space characters in the key name as per [specification](https://spec.editorconfig.org/#file-format).

Regex testings: https://regex101.com/r/l0FeUH/9 (Changed `\s` to `[^\S\r\n]` there only to account for multiline context, see notes)

PR related to issue [dotnet#55431 (comment)](dotnet#55431 (comment))
Closes PR [!78459](dotnet#78459)

Notes:

* To NOT break [current](https://github.com/dotnet/roslyn/blob/1e14d8a2f9eb04b0c9b4076fdc8a7f02d5d53ab1/src/Compilers/Core/Portable/CommandLine/AnalyzerConfig.cs#L25) implementation parsing, inline comments are still available. Removing the trailing `([#;].*)?` part would solve the issue.
* It's a single-line evaluation process implementation, I changed `\s` to `[^\S\r\n]` only in the multiline testing context
* `[\w\.\-_]` ["a word character, or a dot, or a dash, or an underscore"] is much slower than (the previously proposed) `[^=:\s]` ["any non-whitespace character or equal or double dot"], and can't be parsed on multiline online tools with the evaluation list linked above, I thus kept the regex testings link as it was initially
goo.bar baz.quux ztesch.blah = ...
(Retried)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee. VSCode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.editorconfig for diagnostic category with spaces

3 participants