Skip to content

sqlx in some cases more than 60 times slower than Diesel ORM (according to Diesel benchmark suite) #1481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
d4h0 opened this issue Oct 8, 2021 · 5 comments

Comments

@d4h0
Copy link

d4h0 commented Oct 8, 2021

Hi,

I just stumbled upon the following benchmark suite of the Diesel project:

https://github.com/diesel-rs/metrics/

In several of the benchmarks, sqlx is more than 60 times slower than Diesel.

I'm wondering if you guys are aware of these benchmarks, and whether sqlx is really so much slower in some cases, or if there is a problem with the benchmarks.

@NyxCode
Copy link

NyxCode commented Oct 11, 2021

Seems like the sqlite backend is the culprit

@abonander
Copy link
Collaborator

abonander commented Oct 14, 2021

Yeah, the core issue is that SQLite does not have a non-blocking API. Reading through the docs, you might get the impression that you can make it work in a non-blocking fashion, but you lose durability guarantees if you do that (trust me, we tried it). At the end of the day, since it's file I/O, you still have to go to a background thread at some point to perform blocking I/O since there's still no true async file I/O (perhaps outside of io-uring).

In SQLx, the SQLite driver sends all commands to a background thread to perform the operations against the database and then receives results over an async channel. This is obviously going to introduce a lot of overhead compared to calling the SQLite APIs directly but this ensures that we never block the async runtime thread, which is something Diesel doesn't deal with since it's a purely synchronous API.

For the other benchmarks, we're within 2-3x the relative performance of Diesel which isn't bad considering the sync/async difference and the fact that we really haven't put all that much time into optimizing SQLx and there's a lot of low-hanging fruit there that we're waiting on language features for: namely, async fn in traits; right now there's a lot of boxed trait objects going around which is obviously not great.

It's a bit more useful to compare SQLx to the postgres crate since that uses tokio-postgres internally. However, for some reason the SQLx benchmarks use async-std as the runtime, which hasn't been as heavily optimized as Tokio and is generally designed around convenience instead of performance.

It's all apples to oranges. There's also no concrete numbers (except the performance-over-time metrics which we're decently consistent on, again except SQLite), it's just all relative to Diesel's performance. Not great methodology, honestly. Those benchmarks are clearly designed with one intent in mind: to emphasize Diesel's performance over competing crates.

In the 0.7 release, which is blocked on GATs and async fns hitting stable, we're looking to provide a synchronous API for SQLx that can call the SQLite API functions directly. I think then it'll be possible to make a much fairer comparison.

As-is, this issue isn't really actionable so I'm going to close. It'd be interesting to see if there's any improvement when the SQLx benchmarks are switched to using a single-threaded Tokio runtime, but that's an issue to open in Diesel's repo.

@weiznich
Copy link

@abonander Let me clarify a few things here.

It's a bit more useful to compare SQLx to the postgres crate since that uses tokio-postgres internally. However, for some reason the SQLx benchmarks use async-std as the runtime, which hasn't been as heavily optimized as Tokio and is generally designed around convenience instead of performance.

I've chosen to use the async_std crate as executor as that was the default executor that was documented for sqlx at the time I wrote those benchmarks. Additionally I've tried to reach out to you (and other sqlx maintainers) multiple times at the time I've wrote those benchmarks for feedback about the implementation. At least for me it's a bit

It's all apples to oranges. There's also no concrete numbers (except the performance-over-time metrics which we're decently consistent on, again except SQLite), it's just all relative to Diesel's performance. Not great methodology, honestly. Those benchmarks are clearly designed with one intent in mind: to emphasize Diesel's performance over competing crates.

Those numbers are not designed to emphasize diesels performance over other crates. They are just a way for me (and other maintainers) to track diesel's performance evolves over time and how this compares to other crates providing similar functionality. It's not about being faster than x or slower than y, but to get a general feeling about the performance characteristics of other solutions. Sometimes this indicates issues in diesel that can be solved, sometimes this indicates issues in other crates that can be solved (for example the mysql-simple crate saw such improvements back then when I initially wrote those benchmarks). I've choosen to give times relatively to diesel, as this gives a good overview for me. Additionally this automatically converts time differences to log scale, which is sort of required because of the large differences. If you prefer any other base line, feel free to suggest that.
Anyway the repository contains all recorded unprocessed results. So it's really easy to add other numbers/plots there. If you have any suggestions how to plot those numbers in a better way, feel free to suggest something concrete.

As-is, this issue isn't really actionable so I'm going to close. It'd be interesting to see if there's any improvement when the SQLx benchmarks are switched to using a single-threaded Tokio runtime, but that's an issue to open in Diesel's repo.

Feel free to open a PR for this. Generally speaking I'm more than happy to share these benchmark implementations as tool to track performance between all rust database connector implementations.

@abonander
Copy link
Collaborator

I've chosen to use the async_std crate as executor as that was the default executor that was documented for sqlx at the time I wrote those benchmarks.

Yeah, that's mostly a holdover from when SQLx was going to only support async_std. It's not even the default now. Our README really needs some TLC.

Still though, it's not really a fair comparison. Were you just trying to compare how each crate performs straight out of the box?

Additionally I've tried to reach out to you (and other sqlx maintainers) multiple times at the time I've wrote those benchmarks for feedback about the implementation.

Yeah, sorry we didn't get back to you before. At least you finally got your feedback?

They are just a way for me (and other maintainers) to track diesel's performance evolves over time and how this compares to other crates providing similar functionality. It's not about being faster than x or slower than y, but to get a general feeling about the performance characteristics of other solutions.

I guess my issue with it is that it's just a rather crude comparison, and it's hard to see the utility even for Diesel besides emphasizing its performance over other crates.

The choice to make the graphs relative to Diesel means that you're really only going to notice differences in performance in Diesel if suddenly another crate is faster than Diesel when it wasn't before, or if the other crates become better or worse all at once between benchmark runs. It inherently means the metric you're aiming to improve, and advertise, is Diesel's competitiveness with other crates.

And for the authors of the other crates, it means they have to say to themselves something like, "last month we were 2x slower than Diesel and now we're 3x slower... did we get slower or did Diesel get faster?"

The fact that the mysql crate derived some improvement out of it is sort of a false flag because that's also a synchronous API, so its implementation and Diesel's should at least be comparable.

Additionally this automatically converts time differences to log scale, which is sort of required because of the large differences.

I don't think so? It's still a linear scale, but instead of the step size being, say, multiples of 10 μs, it's multiples of Diesel's benchmark time. The graphs would have to go 1x, 10x, 100x, or 1x, 2x, 4x, 8x, etc. to be a logarithmic scale. The SQLite graphs with the massive spikes for SQLx are still on a linear scale, just multiples of 10x.

Feel free to open a PR for this.

I really wish I had the time. I may open an issue on the repo if you don't mind though.

weiznich added a commit to weiznich/diesel that referenced this issue Nov 11, 2021
* Use the tokio runtime for sqlx/sea-orm as requested here: launchbadge/sqlx#1481 (comment)
* Update quaint to a version that uses tokio 1.0 and reenable it in the
  metrics job
@weiznich
Copy link

@abonander

Yeah, that's mostly a holdover from when SQLx was going to only support async_std. It's not even the default now. Our README really needs some TLC.

Still though, it's not really a fair comparison. Were you just trying to compare how each crate performs straight out of the box?

At least for me it was not clear that you expect the tokio backend to perform better. It's definitively not the goal of the benchmarks to artifically slow down any implementation. Because of this I've opened diesel-rs/diesel#2954 to use tokio instead of async_std as runtime. Feel free to comment on the implementation there.

I guess my issue with it is that it's just a rather crude comparison, and it's hard to see the utility even for Diesel besides emphasizing its performance over other crates.

The choice to make the graphs relative to Diesel means that you're really only going to notice differences in performance in Diesel if suddenly another crate is faster than Diesel when it wasn't before, or if the other crates become better or worse all at once between benchmark runs. It inherently means the metric you're aiming to improve, and advertise, is Diesel's competitiveness with other crates.

And for the authors of the other crates, it means they have to say to themselves something like, "last month we were 2x slower than Diesel and now we're 3x slower... did we get slower or did Diesel get faster?"

I did change the visualization of the recorded metrics to only show a box plot of the measured times. At least I see some value in these plots as I can see how the crates currently compare to each other. Yes that does not answer the question if something got faster/slower, but I can see that x is currently faster/slower than y. Which is at least for me an important information.

The fact that the mysql crate derived some improvement out of it is sort of a false flag because that's also a synchronous API, so its implementation and Diesel's should at least be comparable.

OK, I feel there is much wrong with this statement here:

  • If you are interested in why things are as they are (faster/slower than anyone else) you need to sit down with a profiler and measure where time is spend. These plots only give a indication which work load may or may not be interesting for such profiling. Additionally they may give you indication which other implementation may be a good target for inspiration to solve specific problems related to a certain work load. That's exactly what the maintainer of the mysql crate did.
  • I do not see any reason why you shouldn't compare async and sync crates. Yes they have different use cases, but I feel that at least for most people it is not clear that async comes not for free. It has a cost in terms of complexity and performance.
  • As far as I know does the postgres crate internally use tokio-postgres by just wrapping the corresponding calls to an async function into Runtime::block_on. This + the benchmark results for postgres seem to indicate that it is possible to achieve a similar performance than a sync implementation. Based on this I think it's at least questionable to say: Some async implementation is obviously slower than a synchronous one.

I don't think so? It's still a linear scale, but instead of the step size being, say, multiples of 10 μs, it's multiples of Diesel's benchmark time. The graphs would have to go 1x, 10x, 100x, or 1x, 2x, 4x, 8x, etc. to be a logarithmic scale. The SQLite graphs with the massive spikes for SQLx are still on a linear scale, just multiples of 10x.

That's correct, thanks for pointing that out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants