Skip to content

Docstrings swallow leading whitespace #312

@thomaspinckney3

Description

@thomaspinckney3

Per the docstring PEP "Any indentation in the first line of the docstring (i.e., up to the first newline) is insignificant and removed." Empirically this means I can't write a lex rule that starts with spaces or tabs. In some cases this can be worked around by setting t_ignore for C-like languages and/or tracking indentation with lexpos for Python-like indentation syntax.

The PLY documentation states "Although it is possible to define a regular expression rule for whitespace in a manner similar to t_newline()..." but t_newline uses a docstring that starts with whitespace though it's a "\n" which in fact is a kind of whitespace that does work at the start of a docstring. I suggest explicitly saying to use "[ ]" to recognize (ASCII) spaces vs " ".

There is a helpful sentence in the documentation about how regular expressions are compiled with the re.X flag which means un-escaped whitespace is ignored but the suggestion to use \s will capture more than just spaces and tabs. I suggest extending the work-around about how to handle "#"s to also include whitespace just like above - make it a character class "[ ]*".

This is a small issue but one that took me a fair bit of time to track down. Interestingly, my old code that used a pattern like " +" worked until sometime in the past year or two. Perhaps this python change may have changed handling of leading whitespace?

Hopefully this bug report will at least save someone else from going down a rabbit hole.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions