Skip to content

Faster cache deserialization #3456

Open
@JukkaL

Description

@JukkaL

Deserialization of cache files is a major bottleneck in incremental mode, so any performance improvements there would be valuable.

At least @pkch has previously worked on this, and it's unclear if any significant wins are achievable without a lot of work. However, there seem to be some redundant information in the cache files that we haven't looked at eliminating yet.

Here are some random ideas:

  • Use shared type objects at least for common types such as int and str.
  • Don't include default attribute values for the most common AST nodes. For example, we could have "defaults": true to mark that we can skip processing non-default values altogether.
  • Use shared dummy objects for things like empty function bodies.
  • Micro-optimize deserialization of the most common node types.
  • Avoid repeating information about arguments in the serialized form of FuncDefs.

Idea for how to move forward:

  • Run a profiling run when type checking mypy and looks for promising hot spots (spoiler: there probably aren't any major hot spots visible in the profile). Post results here.
  • Determine which node types are the most common ones during deserialization. Also calculate cumulative figures and percentages (of all nodes). Post results here.
  • Prototype various potential optimizations and benchmark them individually. Note that getting reliable benchmark results is hard because of turbo modes, background processes, etc. I may look at setting up a dedicated system for running benchmarks.
  • If speedups from individual optimizations are negligible (under 0.5%) or negative, give up on them and post the negative results here. Otherwise we risk others doing the same experiment in the future.
  • If the total speedup from all optimizations would be under, say, 10%, it probably doesn't make sense to land any of them, unless some optimizations are utterly trivial.

In any case, this issue isn't important enough to spend a lot of time on it. We have other plans to speed up mypy, but as they will take more effort, so some quick wins here would be nice.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions