Skip to content

Conversation

f-f
Copy link
Member

@f-f f-f commented Aug 31, 2024

This is an attempt to improve the performance of lockfile parsing, which is currently super slow (see #1262), by changing the format from YAML to JSON.

We don't have benchmarks in CI, but my local testing is showing that the time spent parsing the lockfile for this project went from ~150ms to ~100ms.
Our lockfile is pretty small so I hope there are bigger gains for bigger projects - @finnhodgkin could you try this out on your project and see if the performance is any better?

@finnhodgkin
Copy link
Contributor

Sorry I only just noticed this. I'll check tomorrow.

I'm guessing this will help to lower the parsing time but probably not decoding:

image

@finnhodgkin
Copy link
Contributor

@f-f it's a definite improvement!

The decoding and parsing is a bit more coupled now so I didn't bother check how long each takes individually but the combined decode+parse took 2.319s on this branch and 3.106s on master 🎉

Given the yaml parsing took 639.543ms before, I'm guessing the parsing is now practically instant and the decoding is roughly the same (would make sense).

@f-f
Copy link
Member Author

f-f commented Sep 3, 2024

Right, so I think this branch is worth merging. I'll fix the tests.

@finnhodgkin would you be able to share the giant lockfile somewhere so I can look at it? My email is in my GitHub profile if you'd rather not share it publicly.

@garyb would you have any input on what we could do to improve JSON decoding here?

@finnhodgkin
Copy link
Contributor

@f-f I've emailed it 👍

@f-f
Copy link
Member Author

f-f commented Sep 3, 2024

@finnhodgkin thanks! 3MB of lockfile, most of which is build_plans. I guess we can do something about that 😄

@garyb
Copy link
Member

garyb commented Sep 3, 2024

@garyb would you have any input on what we could do to improve JSON decoding here?

Not really unfortunately - from looking through the codecs a bit I don't see anything out of the ordinary. The codec library definitely has some overhead, but I don't know the specific bottleneck(s) that would contribute to this, I'd have to have a poke around in a profiler to be able to say anything specific.

At a guess I would imagine record codecs are the biggest culprit though, and I'm not really sure how to squeeze much improvement out of them due to the nature of the machinery behind them. Writing them "manually" might be the only way to get really good performance.

@f-f f-f merged commit c4c9641 into master Sep 5, 2024
5 checks passed
@f-f f-f deleted the json-lockfile branch September 5, 2024 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants