version_extract_regex: how it works #5153

wesley-dean · 2025-04-06T15:35:29Z

wesley-dean
Apr 6, 2025

I've been trying to figure out how to capture the version of a tool while writing a plugin.

Initially, I assumed that the regex used a capture to extract the version string from the output of running the linter_name with cli_version_arg_name appended to it. In my case, the linter_name was j2lint and cli_version_arg_name was --version, so the command to be run would be:

j2lint --version

The output of running that command normally was:

# j2lint --version
Jinja2-Linter Version v1.2.0

I looked at the example in the schema documentation for version_extract_regex and saw the following example:

"(?<=npm-groovy-lint version )\\d+(\\.\\d+)+"

That is, a positive look-behind for the string npm-groovy-lint version followed by one or more digits followed by at least one group of . followed by one or more digits.

I thought that if I used something similar with j2lint, then it would look like:

(?<=Jinja2-Linter Version v)\\d+(\\.\\d+)+

then it would still work. The message I received was:

Unable to extract version with regex re.compile('(?<=Jinja2-Linter Version v)\\d+(\\.\\d+)+') from Jinja2-Linter Version v1.2.0

Per a previous discussion, I went to regex101.com to verify that the syntax was correct and here are the results

I tried many variations of this approach and none of them worked.

Next, I looked in the code to try to find where the call is actually made.

Here's the setting of the default value: https://github.com/oxsecurity/megalinter/blob/main/megalinter/Linter.py#L142

which is:

r"\d+(\.\d+)+"

Here's where it's used: https://github.com/oxsecurity/megalinter/blob/main/megalinter/Linter.py#L1156-L1166

Reading through the documentation for re.search(), I verified that it returns the entire matching portion of the string being searched. That is, it's not looking at any capture groups, but rather the entire matching pattern. That's why the positive look-behind was used in the example in the schema documentation -- it's matched, but not included in the results.

Then, I looked at the documentation for re.Match.group() so that I could fully understand what's happening at line 1161 : it first calls str.split() with no separator (so, one or more whitespace characters separate components), on the results of the pattern search, then it calls str.join() on that sequence with a . string

I thought that perhaps it might be a YAML-related quoting issue, so I tried to use double quotes and single quotes in the version_extract_regex with no luck.

I also tried the same regex but with explicitly-specified character sets and wildcards (i.e., instead of \d, use [0-9]; instead of \., use [.]) which also should work according to regex101.com.

Lastly, I thought maybe the 2 or the dash or the spaces may be throwing things off, so I tried one with wildcards to replace those characters in the look-behind. That still didn't work.

So, after all of that, because j2lint --version only returns one line -- the version with some junk in front of it -- I opted to set a version_extract_regex of r"\d+(\.\d+)+" (i.e., the default, but explicitly specified in the configuration file) and running MegaLinter no longer throws the regex error.

My key take-aways were:

the code is looking for a match of the pattern, not any capture groups within the pattern
the default regex does a good job by itself for semver versions
the look-ahead / look-behind from the schema example is helpful when there may be multiple version numbers being returned so that we're sure we're capturing the desired semver version
quoting and escaping characters is still important so that the YAML processor doesn't treat a single \ as escaping a character in the YAML but not passing that \ along to the regex engine

However, I'm still not sure why the positive look-behind example I was using didn't work.

So, my questions are:

A. Is what I wrote above correct?
B. Does anyone have any thoughts on why my look-behind isn't working properly?

nvuillam · 2025-04-06T19:12:59Z

nvuillam
Apr 6, 2025
Maintainer

I usually build my regexes with ChatGPT and test them with https://regex101.com/ :)

(regex syntax is a mystery to me 🤣 )

0 replies

nvuillam · 2025-04-06T19:13:46Z

nvuillam
Apr 6, 2025
Maintainer

in your case you probably can let the default regex that looks for semver :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

version_extract_regex: how it works #5153

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

version_extract_regex: how it works #5153

Uh oh!

wesley-dean Apr 6, 2025

Replies: 2 comments

Uh oh!

nvuillam Apr 6, 2025 Maintainer

Uh oh!

nvuillam Apr 6, 2025 Maintainer

wesley-dean
Apr 6, 2025

nvuillam
Apr 6, 2025
Maintainer

nvuillam
Apr 6, 2025
Maintainer