Skip to content

Sort modules in dependency cycles based on dependencies #1530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JukkaL opened this issue May 13, 2016 · 7 comments · Fixed by #1736
Closed

Sort modules in dependency cycles based on dependencies #1530

JukkaL opened this issue May 13, 2016 · 7 comments · Fixed by #1736
Labels

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented May 13, 2016

Topo sort modules within dependency cycles to get a more consistent build ordering and fewer bogus errors due to ordering issues. Specifically, each import would have a priority (high/low).
Topo sort modules in a cycle by considering high-priority dependencies first. We can use low-priority dependencies as secondary sort criteria.

We'd assign a high priority in these cases:

  • from m import ... at module top level (not within a function) so that one of the imported names is not a module
  • Imported module referenced at module top level outside import statements (independent of the kind of import)

Exceptions:

  • Anything within if False: is ignored and can't affect ordering.
  • References to a module within type annotations are ignored. (Optimally, we'd only ignore string-literal-quoted references and comment annotations, but we don't have that information yet.)

Intuitively, we want to make build ordering approximate the order in which modules are initialized at runtime.

Example:

from util import helper_func   # high, at top level
import model  # high, used at top level (as base class)
import logger  # low, use only within function

class Person(model.Model):
    def check(self) -> None:
        from validators import validate    # low, within function

def log(s: str) -> None:
    logger.log(s)

Expected benefits:

  1. A base class will usually be type checked before a derived class (if defined in different modules), so attributes and decorated methods in the base class would have inferred types when type checking the derived class.

  2. Less likely to access an imported type alias that hasn't been processed yet, but only if type alias is imported using from m import ....

  3. Order of checking modules within a cycle would be more predictable. This is useful, as ordering changes may result in "Cannot determine type" errors that seem to come out of nowhere.

  4. Cycles caused by mypy-only imports within if False: blocks won't affect the ordering of a build.

This would partially solve #481. The rules above still don't cover all interesting problems but they should give a clear improvement for a lot of code.

This could perhaps be implemented as a new pass in process_stale_scc, where we'd sort the strongly connected component before the main semantic analysis pass.

@JukkaL JukkaL added the feature label May 13, 2016
@JukkaL JukkaL added this to the 0.4.x milestone May 13, 2016
@gvanrossum
Copy link
Member

gvanrossum commented Jun 10, 2016

Test case (reduced from a real-world example):

File x.py:

class Base:
    pass
def foo() -> int:
    import y
    return y.Sub.attr

File y.py:

import x
class Sub(x.Base):
    attr = 0

This passes when run as mypy x.py but fails with mypy y.py:

y.py:1: note: In module imported here:
x.py: note: In function "foo":
x.py:5: error: Cannot determine type of 'attr'

@gvanrossum
Copy link
Member

gvanrossum commented Jun 22, 2016

FWIW, that test case is not easily solved with the suggested heuristic. The heuristic suggests processing x.py first, y.py second (so that the base class is processed before the subclass), but if you do that it will always give that error. :-(

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jun 22, 2016

Yeah, it wouldn't help here. However, the order of processing would be more stable so at least the error would be less likely to appear after an unrelated change.

There are additional things we could do instead, but it's not clear whether they are worth the extra complexity.

One approach that I think we've discussed before is to run full type checking first on module top levels and __init__ methods and type check function and method bodies in a separate pass. This wouldn't help with non-__init__ methods that define attributes, but maybe this is rare enough that the need to annotate those won't feel like a burden. Also, this may complicate other things such as more fine-grained incremental checking, but not necessarily. The motivation should be clear: we'd infer types for most things that are visible outside the module before we type check most things that need those types, so this should fix many of the "Cannot determine type of x" errors.

Another idea it to have another light-weight attribute type inference pass that runs before main type inference that can only deal with simple cases, such as these:

class A:
    x = 0   # type of `x` is clearly `int`

    def __init__(self, y: str) -> None:
        self.y = y   # again this is simple

Some potential heuristics we might use in the light-weight pass:

  • Only look at class bodies and __init__ methods. Skip other method and function bodies. This way this could be pretty fast.
  • There must only be a single assignment to an attribute in the initialization scope, so we don't need to worry about partial types.
  • Assignment must not be nested within a statement such as if or a loop, as these can be tricky.
  • Initializer expression has a simple type. For example, it can be a literal or refer to a local variable with an explicit (not inferred) type that doesn't get assigned to. It can also be a call to a function/class with a simple (non-generic) return type.

The motivation is that if this would infer types for, say, 2/3 of attributes (not a proper estimate) before type checking, the issue would be much less severe and the cases where a type annotation would be required would often be non-trivial so arguably the annotations may improve clarity.

@gvanrossum
Copy link
Member

OK, then I'll go ahead and submit a PR for what I have. The rules I've implemented so far:

  • top-level from M import ... gets priority HIGH
  • top-level import M gets priority MEDIUM
  • all imports inside a function get priority LOW

I don't yet use information about how the imported variable gets used yet; I'm not sure that information is easily available before the semantic analysis starts.

I need a new example though -- something that actually fails due to an import cycle and where the proposed algorithm improves things.

@gvanrossum
Copy link
Member

I ran this over our internal server repo and had to silence a few errors. A light-weight type checker for class attributes would definitely have helped -- all errors were about class attributes with a simple int or str literal as value. So I think I'd like to tackle that next. (The other repo was a red herring.)

gvanrossum pushed a commit that referenced this issue Jun 24, 2016
This thwarts certain kinds of errors due to circular dependencies (see #1530).
@gvanrossum
Copy link
Member

gvanrossum commented Jun 24, 2016

I'm working on some code to infer types from simple literal assignments here: https://github.com/python/mypy/tree/class_attrs

@rwbarton you might want to follow along. There are some tests failing; so far I've identified some failures in the weak-mode tests and some others due to tests for other features relying on undefined types. I've shut up the weak-mode test failures by simply skipping this feature when any weak options are set. I'll shut up the others by making the expressions slightly more complex. We'll see what else fails.

UPDATE: More failures due to the rendering of variables with known types being different (no '*' following the name).

@rwbarton
Copy link
Contributor

I'm working on some code to infer types from simple literal assignments here: https://github.com/python/mypy/tree/class_attrs

Yes, this looks right to me.

gvanrossum added a commit that referenced this issue Jun 27, 2016
…forms (#1736)

This partially addresses #1530 (but does not fully fix it).

The new algorithm always processes x before y, while the old one gives
an error when y is processed before x:
```
x.py:4: note: In module imported here:
y.py: note: In class "Sub":
y.py:3: error: Cannot determine type of 'attr'
```

I am working on additional heuristics in https://github.com/python/mypy/tree/class_attrs.
gvanrossum pushed a commit that referenced this issue Jun 27, 2016
This thwarts certain kinds of errors due to circular dependencies (see #1530).
gvanrossum pushed a commit that referenced this issue Jun 30, 2016
This thwarts certain kinds of errors due to circular dependencies (see #1530).
JukkaL pushed a commit that referenced this issue Jun 30, 2016
Set the type of `x = 0` and similar in the semantic analyzer. This thwarts certain kinds of errors due to circular dependencies (see #1530). Skip this business if weak_opts is non-empty.

Disable lightweight type check if build target < TYPE_CHECK.

Only do the lightweight type inferencing thing outside functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants