-
Notifications
You must be signed in to change notification settings - Fork 470
Description
Per the docstring PEP "Any indentation in the first line of the docstring (i.e., up to the first newline) is insignificant and removed." Empirically this means I can't write a lex rule that starts with spaces or tabs. In some cases this can be worked around by setting t_ignore for C-like languages and/or tracking indentation with lexpos for Python-like indentation syntax.
The PLY documentation states "Although it is possible to define a regular expression rule for whitespace in a manner similar to t_newline()..." but t_newline uses a docstring that starts with whitespace though it's a "\n" which in fact is a kind of whitespace that does work at the start of a docstring. I suggest explicitly saying to use "[ ]" to recognize (ASCII) spaces vs " ".
There is a helpful sentence in the documentation about how regular expressions are compiled with the re.X flag which means un-escaped whitespace is ignored but the suggestion to use \s will capture more than just spaces and tabs. I suggest extending the work-around about how to handle "#"s to also include whitespace just like above - make it a character class "[ ]*".
This is a small issue but one that took me a fair bit of time to track down. Interestingly, my old code that used a pattern like " +" worked until sometime in the past year or two. Perhaps this python change may have changed handling of leading whitespace?
Hopefully this bug report will at least save someone else from going down a rabbit hole.