Skip to content

cache significantly slows down black due to pathlib #1950

@asottile

Description

@asottile

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

With a sufficiently large number of runs of black, the cache accumulates pretty rapidly and causes significant slowdown compared to uncached runs:

uncached

$ rm -rf t; time XDG_CACHE_HOME=$PWD/t black t.py
reformatted t.py
All done! ✨ 🍰 ✨
1 file reformatted.

real	0m0.295s
user	0m0.237s
sys	0m0.036s

cached

$ time black t.py
All done! ✨ 🍰 ✨
1 file left unchanged.

real	0m2.115s
user	0m1.999s
sys	0m0.089s

The cache in question isn't ~that large:

$ du -hs ~/.cache/black/20.8b1/*.pickle
3.8M	/home/asottile/.cache/black/20.8b1/cache.-.88.1.0.pickle
16K	/home/asottile/.cache/black/20.8b1/Grammar3.8.5.final.0.pickle
4.0K	/home/asottile/.cache/black/20.8b1/PatternGrammar3.8.5.final.0.pickle

The reason for all the slowdown appears to be because pathlib is slow -- I've attached a pstats and svg of this:

profile graph

This was generated using this procedure

python -m cProfile -o out.pstats -m black t.py
gprof2dot -z __init__:6604:patched_main out.pstats | dot -Tsvg -o out.svg  # gprof2dot from `yelp-gprof2dot`, `dot` from graphviz

out.zip (contains svg and pstats file)

Expected behavior A clear and concise description of what you expected to happen.

The cache shouldn't make the execution significantly slower

Environment (please complete the following information):

  • Version: [e.g. master] 20.8b1
  • OS and Python version: [e.g. Linux/Python 3.7.4rc1] 3.8.5 on ubuntu 20.04

Does this bug also happen on master? To answer this, you have two options:

yes, if I build up a new cache using the current default branch it reproduces as well

Additional context Add any other context about the problem here.

Removing pathlib from the Cache makes the cache serialization cost ~essentially zero -- will follow up with a demonstration patch

Metadata

Metadata

Assignees

No one assigned

    Labels

    C: performanceBlack is too slow. Or too fast.T: bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions