Docstrings swallow leading whitespace

Per the docstring [PEP](https://peps.python.org/pep-0257/) "Any indentation in the first line of the docstring (i.e., up to the first newline) is insignificant and removed." Empirically this means I can't write a lex rule that starts with spaces or tabs. In some cases this can be worked around by setting t_ignore for C-like languages and/or tracking indentation with lexpos for Python-like indentation syntax.

The PLY documentation states "Although it is possible to define a regular expression rule for whitespace in a manner similar to t_newline()..." but t_newline uses a docstring that starts with whitespace though it's a "\n" which in fact is a kind of whitespace that does work at the start of a docstring. I suggest explicitly saying to use "[ ]*" to recognize (ASCII) spaces vs " "*.

There is a helpful sentence in the documentation about how regular expressions are compiled with the re.X flag which means un-escaped whitespace is ignored but the suggestion to use \s will capture more than just spaces and tabs. I suggest extending the work-around about how to handle "#"s to also include whitespace just like above - make it a character class "[ ]*".

This is a small issue but one that took me a fair bit of time to track down. Interestingly, my old code that used a pattern like " +" worked until sometime in the past year or two. Perhaps [this python change](https://github.com/python/cpython/issues/81283) may have changed handling of leading whitespace?

Hopefully this bug report will at least save someone else from going down a rabbit hole.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docstrings swallow leading whitespace #312

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Docstrings swallow leading whitespace #312

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions