-
-
Notifications
You must be signed in to change notification settings - Fork 485
Hygienic mypy plugin: don't import Django into the mypy process #1384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Comment by @adamchainz #1337 (comment)
|
Original comment by @sobolevn #1337 (comment)
|
Original comment by @flaeppe #1337 (comment)
|
I suspected this as well, reimplementing all of Django's logic in our plugin may be too big and messy to maintain. But the current situation is messy as well, does anyone really disagree? My thinking was to implement something in between the two approaches: the plugin would work off a cache of model metadata. When our mypy plugin needs to update the cache of Django model metadata, it would invoke a subprocess that may import Django, discover all the internals it needs, and returns it in some format. When this subprocess crashes, we can emit some warnings and continue with the old cache, instead of failing entirely. This would solve the main issues I see with the current design:
But I might just be underestimating how difficult this cache discovery or management would be. |
One approach that can be helpful here is to begin by restricting direct manipulation of Django objects to the DjangoContext class. All other users inside the plugin must deal with lightweight objects returned by DjangoContext methods that serve as representations of the underlying Django model metadata. This way it's possible to map the extent of the information that we would have to obtain and cache from Django without committing to a particular implementation. Once the rest of the plugin is using these lightweight metadata objects it's then possible to construct them inside DjangoContext from information gathered by a separate isolated process, for example. |
I'd be very interested to work on this. We have multiple large Django projects at work and the startup times of our projects makes it quite painful to work with mypy, especially when you want like feedback like through an editor integration. It's also quite annoying that you'll loose all feedback while editing if there's an error anywhere in the codebase. Normally mypy would be happy processing the rest of the files. Does anyone have an overview over what features currently rely on the runtime? I'm most familiar with the ORM parts of the plugin and know that it's used extensively there, but I also feel like it should be possible to extract the required information statically. Yes we end up implementing some Django logic, but I don't think it'd be that bad. The one thing I'm unsure about is wether it'd be possible to "backfill" previous models with related managers as we discover them or if we need some kind of global model state available before we can fully count the models as processed. I like the idea of running a separate process to load the Django state, however that'd still have all the downsides of loading up a large and slow codebase. And if we want it to be performant it has to be cached, which means dealing with cache invalidation. I suspect the effort required to get a good caching strategy to work here would quite high. |
A lot of pieces relying on runtime related to models can partly be found here (triggered by mypy's early on django-stubs/mypy_django_plugin/transformers/models.py Lines 590 to 601 in 0117348
Although I think the general case is that usage of the runtime is quite scattered over the whole plugin. I agree with others that there's probably a better abstraction layer to be had, to sort things out.
What about starting to get better compatibility going with the Mypy daemon? That should probably speed things up, if possible. |
Hmm, okay, so looking around the code is seems like the django runtime is used for basically two things: 1) models and related metadata and 2) settings. I believe it should be possible to solve both of those cases without the runtime, but it would probably have some limitations related to things that are very dynamic (ie. managers defined with a dynamic super class and other things that are not easily resolved using using static analysis). For those cases it should be possible to add some manual type hints to help the plugin understand that a manager exists, similar to how you'd add hints if you're trying to type a django project without the plugin.
Yeah, that would definitively be a nice improvement, though I'm completely unfamiliar with the daemon, so I wouldn't know where to start. I tried setting up dmypy though pylsp just the other day and couldn't get it to work. Not entirely sure what the problem was, but I suspect is was somehow related to the Django plugin |
Mypy already caches types in parsed files and handles cache invalidation (
FWIW I've tried dmypy on multiple occasions to automatically typecheck files after saving, and found it to be quite unreliable. When invoking Proper invalidation for discovered Django models would be helpful for dmypy usage anyway. With our current architecture, the user would have to manually restart dmypy every time they change models, to get accurate results. |
I was curious about what it would take to set up the models without the runtime information, so I've spent a few hours today playing around with setting up a plugin from scratch. My progress so far can be found here: https://github.com/ljodal/django-stubs-poc (at the time of writing on this commit) The readme describes some of the complexity I've had to work around to get the dependencies between models set up. It's not ideal, since there's no way of telling what related managers exist just by looking at a single model definition. It is however possible to make this work good enough I think. My idea is basically to build up some state with the information needed to add all the dynamic runtime attributes. Another approach to getting this is also to manually resolve all the model files and parse the files using the |
Nice experimentation!
I'm not sure I agree with this statement. I mean, by statically analysing a model and its MRO we should be able to find all relations the model declares, I hope? I'm thinking that as long as we always can figure out what other model the current model relates to. It should be possible to establish additional attributes (read managers) on the related model. In addition to that, Django comes with a couple of tricks and interfaces to figure out where and which is the related model e.g. What I'm thinking is that the plugin should assume all models are registered. It feels out of scope for it to attempt to figure out if the Django runtime considers it registered or not? As such, all statically resolvable So yeah, it isn't trivial.. From your README:
We could definitely also process all
Mypy has already parsed the ast and massaged it into its internal data structures, found in nodes.py, where a python file's(module) root object is called MypyFile. Couldn't we utilise those structures better? In the end those are what we need to find to report back to mypy. So we might as well use them to begin with? |
Let me clarify with a example: # customers/models.py
class Customer(models.Model):
... # orders/models.py
class Order(models.Model):
customer = models.ForeginKey("customers.Customer", related_name="orders") Here there's no way of telling that
This is absolutely possible, but we also need to know the app label, which can be defined in a different file (
Yes, that's what I'm thinking as well, although depending on the dependency graph mypy might not load all models, so we need another step to ensure all models are loaded. Hence my processing of
I don't think conflicts are allowed, so I assume that shouldn't be too much of a problem?
Yeah, let me clarify a bit here: Yes, mypy does parse everything we need (at least if we tell it to), but as far as I can tell it doesn't make it available in any of the plugin interfaces in a way that makes it possible for us to use this. It seems like So yes, mypy has all the state and context we need, but not made available to plugins in a way that lets us define these relationships easily. Hence all my trickery to declare the dependency hierarchy though the settings file. |
Right, that's true, didn't think of that. The
Could it be some sort of first "milestone" to refactor towards swapping out the settings = import_module(xyz.django_settings_module)
installed_apps = settings.INSTALLED_APPS To collect the context needed to find all registered models? That would allow us to touch less runtime code at least, and should hopefully not be as slow. But we still fall under the prerequisite that we can resolve everything without the Django runtime. I just want to say that I'm all open to get hold of changes that replaces usage of the runtime for solutions through Mypy. Hopefully we've collected quite a bunch of regular use cases in our suite that reveal requirements, once one starts fiddling. |
I think a very good first step would be to do a refactoring of how the runtime logic is used. Right now we directly access e.g. the app registry when transforming the models, which we're not going to be able to do if we don't fire up the django runtime. If we instead load the Django project and collect all the state we need into e.g. some dataclasses then we can later look into populating that state from reading the source rather than doing it at runtime. This would also allow us to make the "backend" for that state swappable, so it's possible to opt-in/out of loading the runtime as needed for each project. I've also been wondering if just loading up the settings module would be a decent middle ground, but ideally I'd like to not do that either. Again, maybe that can be an option among multiple where you can choose how to get the Django state? There's going to be cases that work today that we cannot possibly support by just statically inspecting the code |
Another option is to ask for additional hooks from mypy, either a way to defer in the "get additional deps" callback (so we can wait until the settings module has been discovered) or a separate hook that allows us to first find the settings module before we process the models. Django does a lot of "try importing this module" (e.g. for |
We might perhaps want to keep json-compatibility in mind here. In case we note if it should/would go in to mypy metadata.
I like the idea of a backend interface, allowing it to change quite independently or take different approaches. Not sure about promoting multiple backends though, but definitely some sort of a nice deprecation tool.
We could definitely phase stuff out over time, but still do changes step by step. Trying to keep things reasonably backwards compatible
Could we investigate? Anything you have from the top of your head?
This is definitely an option, but I think we should then define some sort clear approach as to why/what/how for what more exactly is needed. What I'm saying is that we should present something that is thought through and we know is needed and would work. So that we increase our chances of actually getting it and not just become a nuisance |
Yeah, but that can be done even with a dataclass. I like the approach taken by the dataclasses plugin in mypy, and I've also used that in other mypy plugin I've written: it has a dataclass that represents the metadata, with
I guess wether we want/need to support multiple backends long term depends on what limitations come from the no runtime information approach. And having it as opt-in during development would be nice, that way we don't need to do everything in one huge PR, but can gradually implement transformations etc
I've seen stuff like this in a couple of places, for example in django-mptt. If I remember correctly mypy is unable to get the MRO of this class and thus also detecting that it's actually a manager. Might be some magic we can do here though, but I'm unsure. But my concern is that for these cases we'll not be able to detect some managers class MyManager(models.Manager.from_queryset(MyQuerySet):
...
Yeah, exactly what we'd need here is a bit unclear to me, and we might be able to make it work as-is, just with a bit more hassle and a dependency graph that's not ideal (because we don't know about back references when declaring dependencies) |
I imagine it could work if we spawn a separate process that loads the django state and communicate with that such that we get the required information rather than the actual django objects. And then when we detect changes to the settings we close that process and spawn another one (currently trying to reload django in the same process doesn't work cause there's too much global state and caching around imports). I had a look and it seems it would be quite the effort to remove Django specific types from the plugin though. But as a first step the DjangoContext class could still be in process, but progressively changed to no longer expose any django specific types. I believe this is necessary to make it so that we can make dmypy see changes without restarting it every time a setting/model/installed_app is changed. |
We have no immediate plans to implement this, but I decided to open this issue for discussions, to split it out from #1337.
Original post by @intgr #1337 (comment)
The text was updated successfully, but these errors were encountered: