Open
Description
Deserialization of cache files is a major bottleneck in incremental mode, so any performance improvements there would be valuable.
At least @pkch has previously worked on this, and it's unclear if any significant wins are achievable without a lot of work. However, there seem to be some redundant information in the cache files that we haven't looked at eliminating yet.
Here are some random ideas:
- Use shared type objects at least for common types such as
int
andstr
. - Don't include default attribute values for the most common AST nodes. For example, we could have
"defaults": true
to mark that we can skip processing non-default values altogether. - Use shared dummy objects for things like empty function bodies.
- Micro-optimize deserialization of the most common node types.
- Avoid repeating information about arguments in the serialized form of FuncDefs.
Idea for how to move forward:
- Run a profiling run when type checking mypy and looks for promising hot spots (spoiler: there probably aren't any major hot spots visible in the profile). Post results here.
- Determine which node types are the most common ones during deserialization. Also calculate cumulative figures and percentages (of all nodes). Post results here.
- Prototype various potential optimizations and benchmark them individually. Note that getting reliable benchmark results is hard because of turbo modes, background processes, etc. I may look at setting up a dedicated system for running benchmarks.
- If speedups from individual optimizations are negligible (under 0.5%) or negative, give up on them and post the negative results here. Otherwise we risk others doing the same experiment in the future.
- If the total speedup from all optimizations would be under, say, 10%, it probably doesn't make sense to land any of them, unless some optimizations are utterly trivial.
In any case, this issue isn't important enough to spend a lot of time on it. We have other plans to speed up mypy, but as they will take more effort, so some quick wins here would be nice.