Skip to content

Remove Sync Code #667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Remove Sync Code #667

wants to merge 9 commits into from

Conversation

jmpunkt
Copy link
Contributor

@jmpunkt jmpunkt commented May 23, 2020

Overview

Removes sync code and uses the existing async functions. There are some issues with the interface macro. For now generics are not working since it is not possible to constrain generics with the interface macro. The macro needs a rewrite. Related test cases are disabled.

Changes

  • all fields must be prepended with async (ensure that the user knows we are in an async context)
  • small refactoring in the codegen crate
  • no more sync inspection and execution functions (affects for example iron and rocket)
  • sync functions are replaced with an function (only functions which already had a counterpart)

Open Questions

  • there are two rocket crates. replace the sync rocket crate with the async crate?

Related

@LegNeato
Copy link
Member

Awesome! #666 overhauls the interface macro and supports generics.

@LegNeato
Copy link
Member

LegNeato commented May 24, 2020

I feel like we want to keep the ability to have sync resolver functions in GraphQL Objects. Often resolvers will be simple struct field lookups and it sucks to always have to have to have the overhead of a future in those cases. Thoughts?

@tyranron
Copy link
Member

@LegNeato

#666 overhauls the interface macro and supports generics.

Actually, not. It touches unions only.

The overall my plan was/is:

  1. Rework unions, to asyncify them properly.
  2. Rework interfaces, to asyncify them properly.
  3. Now, when we have async support for anything, drop the sync code and rework core juniper traits (GraphQLType/GraphQLTypeAsync).
  4. Rework and improve other macros.

I feel like we want to keep the ability to have sync resolver functions in GraphQL Objects.

Agree. It would be nice to have ability to use sync resolvers whenever it's possible.

  • there are two rocket crates. replace the sync rocket crate with the async crate?

I think we can continue support both, backing the sync version with runtime::block_on().

@LegNeato
Copy link
Member

LegNeato commented May 26, 2020

@tyranron my mistake, I just figured you would keep going onto interfaces as they are structurally similar to unions

@jmpunkt
Copy link
Contributor Author

jmpunkt commented May 29, 2020

Yeah sync resolver should still be possible. I would ensure that async is the default and that the user can specify a function as non-async. Similar to what we did before with container. In this case the user must ensure that the operation not long running. Long running tasks would use a thread-pool with a future handle.

I think we can continue support both, backing the sync version with runtime::block_on().

Currently rocket does not support async, thus block_on is already used. The only difference is the function signature and the position of the block_on in the call chain. So the best option would be to merge both creates into a single one?!

@tyranron
Copy link
Member

@jmpunkt

So the best option would be to merge both creates into a single one?!

I'm unsure, actually. It would be nice to have in this situations some tips from rocket users. I don't use it, so not quite familiar with its ecosystem and how-to-bake-things-for-rocket.

@msdinit
Copy link
Contributor

msdinit commented Jun 1, 2020

It would be nice to have in this situations some tips from rocket users.

Personally I would ask to keep both Rocket crates at least for now.
The difference between them is that sync version targets current released version of Rocket, while async integration crate targets async branch of Rocket directly, which is incompatible with latest release.
Once async version of Rocket is released (no timeline as far as I know), sync integration could be deleted, but until then users of stable version of Rocket are not able to use async integration and vice versa

@jmpunkt
Copy link
Contributor Author

jmpunkt commented Jun 5, 2020

I did not notice that rocket-async was on a different version. Therefore, we have two crates until rocket with async support is supported.

I also investigated the sync revolvers problem. Allowing the user to specify sync revolvers does not make sense with the current API. All functions are packed into futures anyway. The reason for this is that values of a custom resolver is converted into resolvables (with IntoResolveable). To resolve the intermediate value, the async resolver is called. Introducing "real" sync resolver would require to have the same API as before. Something like a simple lookup does not exist. Converting a field value into an actual output requires additional steps anyways. However, in order to reduce the usage of futures, async move {..} were limited to only necessary blocks. Now field look ups are not inside a separate async move { ... } block anymore.

In order to decide what we do here, we should run some benchmarks. Maybe Rust is clever optimizes our generated code or it does not affect the code at all.

jmpunkt added 5 commits June 6, 2020 12:07
Both functions are equal. The only difference is the length of the
name.
Instead of using `FutureOrdered`, the futures are resolved directly
inside the function. Now it is not possible to run these futures in
parallel. However, this increases the performance (~20%) for large
queries (e.g. introspection).
@jmpunkt
Copy link
Contributor Author

jmpunkt commented Jun 7, 2020

Intro

To ensure that benchmark somehow works, the existing approaches in criterion and bencher are put together and now it is only using criterion. The following test cases are implemented inside the benchmark:

  • User Flat (from existing criterion benchmark)
  • Introspection Full
  • Introspection Full except for descriptions
  • Introspection Type Name

Results

It seems that having no sync resolver is fine. Looking at the results, the "simple" resolvers have a similar performance. On the other side, the full introspection is causing issues. Based on the criterion, the it is a 70% regression. It was possible to cut those numbers from 80% by optimizing the code.

Users Flat - Introspection/Sync [324.66 us 325.07 us 325.49 us]
Users Flat - Introspection Query Full/Single Thread  [534.75 us 535.23 us 535.69 us]
Users Flat - Introspection Query Full/Threadpool  [552.21 us 553.54 us 554.81 us]

Users Flat - Instant/Sync/1 [40.110 us 40.118 us 40.126 us]
Users Flat - Instant/Single Thread/1 [38.659 us 38.699 us 38.739 us]
Users Flat - Instant/Threadpool/1 [42.152 us 42.284 us 42.412 us]

Users Flat - Instant/Sync/10  [41.850 us 41.861 us 41.872 us]
Users Flat - Instant/Single Thread/10  [41.393 us 41.422 us 41.449 us]
Users Flat - Instant/Threadpool/10 [41.249 us 41.316 us 41.386 us]

Discussion

I think the main reason for the increase is the usage of recursive resolves with async functions. It does not matter if we are supporting sync resolvers. There is no performance regression (default parameters in criterion) for simple fields. Optimizing the actual resolver process should be the main focus during the next steps. The previous commits allows implementing various benchmark use-cases. Currently my hope is that the async functions works better with actual web frameworks. The benchmark runs in "isolation" and calls the execute method directly. The performance regression might be neutralized while using with a fully asynchronous framework. Notice that web frameworks have also an overhead. Depending on the overhead of the framework, the regression might be insignificant.

Notes

  • If someone has a large working schema in Juniper, I would look forward to benchmark this as well.

@LegNeato
Copy link
Member

LegNeato commented Jun 8, 2020

This is great work!

I'm now a little worried about full async, especially with lookaheads. If one does eager loading of the data once then field resolutions are simple struct lookups. Without a sync option those field lookups are now a future and as you note having a bunch of those nested is what lowers perf.

@jmpunkt
Copy link
Contributor Author

jmpunkt commented Jun 8, 2020

I would like to refactor the resolver logic into a non-recursive function. Maybe this solves the problem and allows having a similar performance as the current version.

Another thing I will checkout is the performance with a syntactic HTTP load (maybe using warp). With this and the previous results we should have enough information to decide if full async is the way to go.

@LegNeato
Copy link
Member

LegNeato commented Jun 12, 2020

The more I think about it, the more I am thinking this might be the wrong direction. The async-graphql project is having performance concerns on deep queries due to so much async. They aren't set up for sync resolvers and it impacts their perf. Right now we have the best of both worlds from a user standpoint...you get to pick and choose if your resolver is async or not and don't really pay any overhead for sync resolvers (actually, not sure if this is true anymore).

Still thinking about this and would love more insight from yourself and @tyranron as well.

I'm planning to generate some synthetic schemas and queries where we can really see the perf tradeoffs in various scenarios...not sure when I will get to it though.

@jmpunkt
Copy link
Contributor Author

jmpunkt commented Jun 12, 2020

Yeah same here. The more I try different things, the more I realize that this might not be the best case. Sadly the performance decreased further.

The benchmarks with warp showed that the performance penalties are directly forwarded. I also noticed that running a profiler on async code is hard (or maybe I miss something?). Therefore, it is not clear what exactly hurts the runtime performance (currently I am guessing the extensive boxing in the internal resolver code and not in the generated code).

At the moment, I am trying to implement an API which unifies both async and sync. The internal code is wrapped into manually implemented futures, rather than boxed futures. Maybe this helps to increase the performance. The design would allow to specify #[sync] resolver, rather than all async. The whole process is trail-and-error, thus taking some time.

I'm planning to generate some synthetic schemas and queries where we can really see the perf tradeoffs in various scenarios...not sure when I will get to it though.

This is great. Having schemas with different needs is exactly what we need for testing. One thing that came into my mind during testing, is how to test an resolvable object which has multiple fields with a long time to resolve. The async resolver should have a better performance in this case. But testing this is could be tricky. Resolving multiple fields in "parallel" (without multiple threads) is the motivation behind this feature. We should also consider this motivation in our final decision.

If we can not fix the performance at the end, we should not merge this. The limit for a regression should be 10-20%. Even if this PR is not merge, we should back-port the benchmarks and maybe other refactors which improve the performance.

@jmpunkt
Copy link
Contributor Author

jmpunkt commented Jun 20, 2020

After testing parts of my idea, I came to the conclusion it is not possible (or at least with the current implementation). The problem is creating objects on the resolve path. For example, creating a sub-executor for a field creates an owned executor (which is derived from the previous executor). The problem is that the output would depend on the executor (which is owned by the current function). This is due the requirement of the async part and the lifetime requirements. In theory this can be fixed by splitting the executor into multiple parts. Not all parts of the executor need to be mutable. Therefore, it is possible that the output of a function can depend on these immutable parts. Everything else can not be used to create an output. It would be still possible to use the context to make a decision, but it would not be possible to return the value of, e.g., a HashMap.

Since this implement is too slow, we should stick with the current implementation.

The next step is the extract useful findings and implement them for the current approach. This includes the benchmark suite, unnecessary boxing, and lifetimes. With the benchmark we can determine if these findings are actual performance gains. This is done in a new PR since this is quite old and rebasing seems unnecessary.

@jmpunkt jmpunkt closed this Jun 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants