version_extract_regex: how it works #5153
wesley-dean
started this conversation in
General
Replies: 2 comments
-
I usually build my regexes with ChatGPT and test them with https://regex101.com/ :) (regex syntax is a mystery to me 🤣 ) |
Beta Was this translation helpful? Give feedback.
0 replies
-
in your case you probably can let the default regex that looks for semver :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've been trying to figure out how to capture the version of a tool while writing a plugin.
Initially, I assumed that the regex used a capture to extract the version string from the output of running the
linter_name
withcli_version_arg_name
appended to it. In my case, thelinter_name
wasj2lint
andcli_version_arg_name
was--version
, so the command to be run would be:The output of running that command normally was:
I looked at the example in the schema documentation for version_extract_regex and saw the following example:
That is, a positive look-behind for the string
npm-groovy-lint version
followed by one or more digits followed by at least one group of.
followed by one or more digits.I thought that if I used something similar with
j2lint
, then it would look like:(?<=Jinja2-Linter Version v)\\d+(\\.\\d+)+
then it would still work. The message I received was:
Per a previous discussion, I went to regex101.com to verify that the syntax was correct and here are the results
I tried many variations of this approach and none of them worked.
Next, I looked in the code to try to find where the call is actually made.
Here's the setting of the default value: https://github.com/oxsecurity/megalinter/blob/main/megalinter/Linter.py#L142
which is:
Here's where it's used: https://github.com/oxsecurity/megalinter/blob/main/megalinter/Linter.py#L1156-L1166
Reading through the documentation for re.search(), I verified that it returns the entire matching portion of the string being searched. That is, it's not looking at any capture groups, but rather the entire matching pattern. That's why the positive look-behind was used in the example in the schema documentation -- it's matched, but not included in the results.
Then, I looked at the documentation for re.Match.group() so that I could fully understand what's happening at line 1161 : it first calls str.split() with no separator (so, one or more whitespace characters separate components), on the results of the pattern search, then it calls str.join() on that sequence with a
.
stringI thought that perhaps it might be a YAML-related quoting issue, so I tried to use double quotes and single quotes in the
version_extract_regex
with no luck.I also tried the same regex but with explicitly-specified character sets and wildcards (i.e., instead of
\d
, use[0-9]
; instead of\.
, use[.]
) which also should work according to regex101.com.Lastly, I thought maybe the
2
or the dash or the spaces may be throwing things off, so I tried one with wildcards to replace those characters in the look-behind. That still didn't work.So, after all of that, because
j2lint --version
only returns one line -- the version with some junk in front of it -- I opted to set aversion_extract_regex
ofr"\d+(\.\d+)+"
(i.e., the default, but explicitly specified in the configuration file) and running MegaLinter no longer throws the regex error.My key take-aways were:
\
as escaping a character in the YAML but not passing that\
along to the regex engineHowever, I'm still not sure why the positive look-behind example I was using didn't work.
So, my questions are:
A. Is what I wrote above correct?
B. Does anyone have any thoughts on why my look-behind isn't working properly?
Beta Was this translation helpful? Give feedback.
All reactions