Skip to content

Statement is unreachable when checking if Match[str].group(...) type is None #7363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Jackenmen opened this issue Aug 18, 2019 · 3 comments
Closed
Labels
bug mypy got something wrong false-positive mypy gave an error on correct code needs discussion priority-1-normal topic-plugins The plugin API and ideas for new plugins topic-reachability Detecting unreachable code topic-strict-optional

Comments

@Jackenmen
Copy link

Hi, I am not entirely sure if this is the right place to report it so please tell me if I should submit the issue somewhere else.

Issue type: Bug
Problematic code:

from typing import Match, Optional
import re


def some_function(some_match: Match[str]) -> bool:
    maybe_match_group = some_match.group(1)
    if maybe_match_group is None:
        group_matched = True
    else:
        group_matched = False
    return group_matched


pattern = re.compile("(str)?")
match = pattern.fullmatch("")
if match is not None:
    print(some_function(match))

Error message:

main.py:8: error: Statement is unreachable

Expected behaviour:
No error as the statement is reachable (and it's used when you run this code). I tried using Match[Optional[str]] to fix this but it's not valid argument for Match.
Another thing to note is that this code will not error if I'll just use Match without specifying type of string.

Python version:: 3.7.4
Mypy version: 0.730, occurs on master as well
Mypy flags: --warn-unreachable

@Michael0x2a
Copy link
Collaborator

This is a legitimate bug/limitation in mypy. In short, what's happening is that the Match.group(...) function is currently defined in typeshed to only ever return a str or a byte, depending on whether you have a Match[str] or a Match[byte].

This is obviously incorrect, as you noted. And due to this incorrect annotation, the --warn-unreachable flag ends up assuming that the "if the group is None" branch will never be run.

We actually did experiment with having the return be an Optional, but it ended up causing regressions in various codebases we were testing on. E.g. some snippet of code was using a regex where there would always be N groups and so rightfully skipped the 'is this group None?' checks.

The reason why not specifying the type of the string works is because such a type is equivalent to doing Match[Any]. So, group() ends up also returning Any.

cc @ilevkivskyi, since we were just talking about this the other day. Maybe writing a plugin is the right approach after all? Doing re.compile(...).groups will at least get you the upper bound on the number of groups there'll be, and doing sre_parse.parse(...) seems to get us an AST of the regex we could perhaps try analyzing.

@ilevkivskyi
Copy link
Member

OK, it looks like we need some compromise here, possible plan is:

  • Infer non-optional type for .group() and .group(0)
  • Infer precise type for .group(n) when a plugin can
  • Fallback to Any in other cases
  • For multi-argument calls infer precise type when plugin can
  • Fallback to optional type in multi-argument calls

This should:

  • fix all bugs with --warn-unreachable
  • not generate too many false positives for optional types
  • still have some decent type checking for str/bytes

@JelleZijlstra @srittau

@ilevkivskyi ilevkivskyi added bug mypy got something wrong false-positive mypy gave an error on correct code needs discussion priority-1-normal topic-plugins The plugin API and ideas for new plugins topic-strict-optional labels Aug 19, 2019
Michael0x2a added a commit to Michael0x2a/mypy that referenced this issue Oct 27, 2019
This pull request adds a plugin to make mypy infer more
precise types when grabbing regex groups: the plugin will
when possible analyze original regex to deduce whether a
given group is required or not.

```
from typing_extensions import Final, Literal
import re

pattern: Final = re.compile("(a)(b)*")
match: Final = pattern.match("")
if match:
    reveal_type(match.groups())  # Revealed type is Tuple[str, Optional[str]]
    reveal_type(match.group(0))  # Revealed type is str
    reveal_type(match.group(1))  # Revealed type is str
    reveal_type(match.group(2))  # Revealed type is Optional[str]

    index: int
    reveal_type(match.group(index))  # Revealed type is Optional[str]

    # Error: Regex has 3 total groups, given group number 5 is too big
    match.group(5)
```

To track this information, I added in an optional 'metadata' dict
field to the Instance class, similar to the metadata dict for
plugins in TypeInfos. We skip serializing this dict if it does not
contain any data.

A limitation of this plugin is that both the pattern and the match
variables must be declared to be final. Otherwise, we just default
to using whatever types are defined in typeshed.

This is because we set and erase the metadata field in exactly the
same way we set and erase the `last_known_value` field in Instances:
both kinds of info are "transient" and are unsafe to keep around if
the variable reference is mutable.

This limitation *does* end up limiting the usefulness of this plugin
to some degree: it won't support common patterns like the below, since
variables aren't allowed to be declared final inside loops:

```
for line in file:
    match = pattern.match(line)
    if match:
        ...
```

Possibly we can remove this limitation by making mypy less aggressive
about removing this transient info by tracking the "lifetime" of this
sort of data in some way?

This pull request should mostly address
python#7363, though it's unclear if it
really fully resolves it: we might want to do something about the
limitation described above and re-tune typeshed first.

The other mostly unrelated change this PR makes is to refactor some of
the helper functions in checker.py into typeops.py so I could use them
more cleanly in the plugin.
Michael0x2a added a commit to Michael0x2a/mypy that referenced this issue Oct 27, 2019
This pull request adds a plugin to make mypy infer more
precise types when grabbing regex groups: the plugin will
when possible analyze original regex to deduce whether a
given group is required or not.

```
from typing_extensions import Final, Literal
import re

pattern: Final = re.compile("(a)(b)*")
match: Final = pattern.match("")
if match:
    reveal_type(match.groups())  # Revealed type is Tuple[str, Optional[str]]
    reveal_type(match.group(0))  # Revealed type is str
    reveal_type(match.group(1))  # Revealed type is str
    reveal_type(match.group(2))  # Revealed type is Optional[str]

    index: int
    reveal_type(match.group(index))  # Revealed type is Optional[str]

    # Error: Regex has 3 total groups, given group number 5 is too big
    match.group(5)
```

To track this information, I added in an optional 'metadata' dict
field to the Instance class, similar to the metadata dict for
plugins in TypeInfos. We skip serializing this dict if it does not
contain any data.

A limitation of this plugin is that both the pattern and the match
variables must be declared to be final. Otherwise, we just default
to using whatever types are defined in typeshed.

This is because we set and erase the metadata field in exactly the
same way we set and erase the `last_known_value` field in Instances:
both kinds of info are "transient" and are unsafe to keep around if
the variable reference is mutable.

This limitation *does* end up limiting the usefulness of this plugin
to some degree: it won't support common patterns like the below, since
variables aren't allowed to be declared final inside loops:

```
for line in file:
    match = pattern.match(line)
    if match:
        ...
```

Possibly we can remove this limitation by making mypy less aggressive
about removing this transient info by tracking the "lifetime" of this
sort of data in some way?

This pull request should mostly address
python#7363, though it's unclear if it
really fully resolves it: we might want to do something about the
limitation described above and re-tune typeshed first.

The other mostly unrelated change this PR makes is to refactor some of
the helper functions in checker.py into typeops.py so I could use them
more cleanly in the plugin.
@AlexWaygood AlexWaygood added the topic-reachability Detecting unreachable code label Mar 27, 2022
@AlexWaygood
Copy link
Member

Typeshed improvements mean that this can no longer be reproduced in 0.920+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong false-positive mypy gave an error on correct code needs discussion priority-1-normal topic-plugins The plugin API and ideas for new plugins topic-reachability Detecting unreachable code topic-strict-optional
Projects
None yet
Development

No branches or pull requests

4 participants