Remove Sync Code #667

jmpunkt · 2020-05-23T10:19:08Z

Overview

Removes sync code and uses the existing async functions. There are some issues with the interface macro. For now generics are not working since it is not possible to constrain generics with the interface macro. The macro needs a rewrite. Related test cases are disabled.

Changes

all fields must be prepended with async (ensure that the user knows we are in an async context)
small refactoring in the codegen crate
no more sync inspection and execution functions (affects for example iron and rocket)
sync functions are replaced with an function (only functions which already had a counterpart)

Open Questions

there are two rocket crates. replace the sync rocket crate with the async crate?

Yeah sync resolver should still be possible. I would ensure that async is the default and that the user can specify a function as non-async. Similar to what we did before with container. In this case the user must ensure that the operation not long running. Long running tasks would use a thread-pool with a future handle.

I think we can continue support both, backing the sync version with runtime::block_on().

Currently rocket does not support async, thus block_on is already used. The only difference is the function signature and the position of the block_on in the call chain. So the best option would be to merge both creates into a single one?!

tyranron · 2020-05-29T13:21:27Z

@jmpunkt

So the best option would be to merge both creates into a single one?!

I'm unsure, actually. It would be nice to have in this situations some tips from rocket users. I don't use it, so not quite familiar with its ecosystem and how-to-bake-things-for-rocket.

msdinit · 2020-06-01T15:22:28Z

It would be nice to have in this situations some tips from rocket users.

Personally I would ask to keep both Rocket crates at least for now.
The difference between them is that sync version targets current released version of Rocket, while async integration crate targets async branch of Rocket directly, which is incompatible with latest release.
Once async version of Rocket is released (no timeline as far as I know), sync integration could be deleted, but until then users of stable version of Rocket are not able to use async integration and vice versa

jmpunkt · 2020-06-05T15:15:10Z

I did not notice that rocket-async was on a different version. Therefore, we have two crates until rocket with async support is supported.

I also investigated the sync revolvers problem. Allowing the user to specify sync revolvers does not make sense with the current API. All functions are packed into futures anyway. The reason for this is that values of a custom resolver is converted into resolvables (with IntoResolveable). To resolve the intermediate value, the async resolver is called. Introducing "real" sync resolver would require to have the same API as before. Something like a simple lookup does not exist. Converting a field value into an actual output requires additional steps anyways. However, in order to reduce the usage of futures, async move {..} were limited to only necessary blocks. Now field look ups are not inside a separate async move { ... } block anymore.

In order to decide what we do here, we should run some benchmarks. Maybe Rust is clever optimizes our generated code or it does not affect the code at all.

Both functions are equal. The only difference is the length of the name.

Instead of using `FutureOrdered`, the futures are resolved directly inside the function. Now it is not possible to run these futures in parallel. However, this increases the performance (~20%) for large queries (e.g. introspection).

jmpunkt · 2020-06-07T20:07:36Z

Intro

To ensure that benchmark somehow works, the existing approaches in criterion and bencher are put together and now it is only using criterion. The following test cases are implemented inside the benchmark:

User Flat (from existing criterion benchmark)
Introspection Full
Introspection Full except for descriptions
Introspection Type Name

Results

It seems that having no sync resolver is fine. Looking at the results, the "simple" resolvers have a similar performance. On the other side, the full introspection is causing issues. Based on the criterion, the it is a 70% regression. It was possible to cut those numbers from 80% by optimizing the code.

Users Flat - Introspection/Sync [324.66 us 325.07 us 325.49 us]
Users Flat - Introspection Query Full/Single Thread  [534.75 us 535.23 us 535.69 us]
Users Flat - Introspection Query Full/Threadpool  [552.21 us 553.54 us 554.81 us]

Users Flat - Instant/Sync/1 [40.110 us 40.118 us 40.126 us]
Users Flat - Instant/Single Thread/1 [38.659 us 38.699 us 38.739 us]
Users Flat - Instant/Threadpool/1 [42.152 us 42.284 us 42.412 us]

Users Flat - Instant/Sync/10  [41.850 us 41.861 us 41.872 us]
Users Flat - Instant/Single Thread/10  [41.393 us 41.422 us 41.449 us]
Users Flat - Instant/Threadpool/10 [41.249 us 41.316 us 41.386 us]

Discussion

I think the main reason for the increase is the usage of recursive resolves with async functions. It does not matter if we are supporting sync resolvers. There is no performance regression (default parameters in criterion) for simple fields. Optimizing the actual resolver process should be the main focus during the next steps. The previous commits allows implementing various benchmark use-cases. Currently my hope is that the async functions works better with actual web frameworks. The benchmark runs in "isolation" and calls the execute method directly. The performance regression might be neutralized while using with a fully asynchronous framework. Notice that web frameworks have also an overhead. Depending on the overhead of the framework, the regression might be insignificant.

Notes

If someone has a large working schema in Juniper, I would look forward to benchmark this as well.

LegNeato · 2020-06-08T02:12:33Z

This is great work!

I'm now a little worried about full async, especially with lookaheads. If one does eager loading of the data once then field resolutions are simple struct lookups. Without a sync option those field lookups are now a future and as you note having a bunch of those nested is what lowers perf.

jmpunkt · 2020-06-08T17:56:08Z

I would like to refactor the resolver logic into a non-recursive function. Maybe this solves the problem and allows having a similar performance as the current version.

Another thing I will checkout is the performance with a syntactic HTTP load (maybe using warp). With this and the previous results we should have enough information to decide if full async is the way to go.

LegNeato · 2020-06-12T18:44:15Z

The more I think about it, the more I am thinking this might be the wrong direction. The async-graphql project is having performance concerns on deep queries due to so much async. They aren't set up for sync resolvers and it impacts their perf. Right now we have the best of both worlds from a user standpoint...you get to pick and choose if your resolver is async or not and don't really pay any overhead for sync resolvers (actually, not sure if this is true anymore).

Still thinking about this and would love more insight from yourself and @tyranron as well.

I'm planning to generate some synthetic schemas and queries where we can really see the perf tradeoffs in various scenarios...not sure when I will get to it though.

jmpunkt · 2020-06-12T20:29:48Z

Yeah same here. The more I try different things, the more I realize that this might not be the best case. Sadly the performance decreased further.

The benchmarks with warp showed that the performance penalties are directly forwarded. I also noticed that running a profiler on async code is hard (or maybe I miss something?). Therefore, it is not clear what exactly hurts the runtime performance (currently I am guessing the extensive boxing in the internal resolver code and not in the generated code).

At the moment, I am trying to implement an API which unifies both async and sync. The internal code is wrapped into manually implemented futures, rather than boxed futures. Maybe this helps to increase the performance. The design would allow to specify #[sync] resolver, rather than all async. The whole process is trail-and-error, thus taking some time.

I'm planning to generate some synthetic schemas and queries where we can really see the perf tradeoffs in various scenarios...not sure when I will get to it though.

This is great. Having schemas with different needs is exactly what we need for testing. One thing that came into my mind during testing, is how to test an resolvable object which has multiple fields with a long time to resolve. The async resolver should have a better performance in this case. But testing this is could be tricky. Resolving multiple fields in "parallel" (without multiple threads) is the motivation behind this feature. We should also consider this motivation in our final decision.

If we can not fix the performance at the end, we should not merge this. The limit for a regression should be 10-20%. Even if this PR is not merge, we should back-port the benchmarks and maybe other refactors which improve the performance.

jmpunkt · 2020-06-20T12:06:47Z

After testing parts of my idea, I came to the conclusion it is not possible (or at least with the current implementation). The problem is creating objects on the resolve path. For example, creating a sub-executor for a field creates an owned executor (which is derived from the previous executor). The problem is that the output would depend on the executor (which is owned by the current function). This is due the requirement of the async part and the lifetime requirements. In theory this can be fixed by splitting the executor into multiple parts. Not all parts of the executor need to be mutable. Therefore, it is possible that the output of a function can depend on these immutable parts. Everything else can not be used to create an output. It would be still possible to use the context to make a decision, but it would not be possible to return the value of, e.g., a HashMap.

Since this implement is too slow, we should stick with the current implementation.

The next step is the extract useful findings and implement them for the current approach. This includes the benchmark suite, unnecessary boxing, and lifetimes. With the benchmark we can determine if these findings are actual performance gains. This is done in a new PR since this is quite old and rebasing seems unnecessary.

jmpunkt added 2 commits May 23, 2020 12:08

Initial

bafe04d

Merge branch 'master' into remove-sync

6ea7b35

jmpunkt added 2 commits June 5, 2020 16:57

Remove nested async bodies and async test for unions

da82e73

Re-enabled rocket (current release) with proper testing

33c6214

jmpunkt added 5 commits June 6, 2020 12:07

Update Criterion benchmark and converter old into Criterion

3abf862

Replaced futures::future::FutureExt::boxed with Box::pin

23e3d1c

Both functions are equal. The only difference is the length of the name.

Fixed benchmark queries and module layout

8de5298

Optimized code for given benchmarks.

fbc4fb6

Instead of using `FutureOrdered`, the futures are resolved directly inside the function. Now it is not possible to run these futures in parallel. However, this increases the performance (~20%) for large queries (e.g. introspection).

Replaced RwLock with async Mutext

b6f6e1c

jmpunkt closed this Jun 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Sync Code #667

Remove Sync Code #667

jmpunkt commented May 23, 2020

LegNeato commented May 23, 2020

LegNeato commented May 24, 2020 •

edited

Loading

tyranron commented May 25, 2020

LegNeato commented May 26, 2020 •

edited

Loading

jmpunkt commented May 29, 2020

tyranron commented May 29, 2020

msdinit commented Jun 1, 2020

jmpunkt commented Jun 5, 2020

jmpunkt commented Jun 7, 2020

LegNeato commented Jun 8, 2020

jmpunkt commented Jun 8, 2020

LegNeato commented Jun 12, 2020 •

edited

Loading

jmpunkt commented Jun 12, 2020

jmpunkt commented Jun 20, 2020

Remove Sync Code #667

Remove Sync Code #667

Conversation

jmpunkt commented May 23, 2020

Overview

Changes

Open Questions

Related

LegNeato commented May 23, 2020

LegNeato commented May 24, 2020 • edited Loading

tyranron commented May 25, 2020

LegNeato commented May 26, 2020 • edited Loading

jmpunkt commented May 29, 2020

tyranron commented May 29, 2020

msdinit commented Jun 1, 2020

jmpunkt commented Jun 5, 2020

jmpunkt commented Jun 7, 2020

Intro

Results

Discussion

Notes

LegNeato commented Jun 8, 2020

jmpunkt commented Jun 8, 2020

LegNeato commented Jun 12, 2020 • edited Loading

jmpunkt commented Jun 12, 2020

jmpunkt commented Jun 20, 2020

LegNeato commented May 24, 2020 •

edited

Loading

LegNeato commented May 26, 2020 •

edited

Loading

LegNeato commented Jun 12, 2020 •

edited

Loading