-
Notifications
You must be signed in to change notification settings - Fork 3.9k
stub: optimize ThreadlessExecutor used for blocking calls #5516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replace LinkedBlockingQueue with ConcurrentLinkedQueue and explicit blocking.
@@ -639,20 +642,33 @@ public void onClose(Status status, Metadata trailers) { | |||
* Waits until there is a Runnable, then executes it and all queued Runnables after it. | |||
*/ | |||
public void waitAndDrain() throws InterruptedException { | |||
Runnable runnable = queue.take(); | |||
while (runnable != null) { | |||
Runnable runnable = poll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mark waitAndDrain()
with @NotThreadSafe
because there must not be two concurrent callers of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the class is only for internal use in an SPSC context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible thing you could try is to make this SPSC. I have a POC (originally for SerializingExecutor) here: https://github.com/grpc/grpc-java/pull/3778/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carl-mastrangelo sure, I remember seeing that before, could be worth a try here too. I thought (possibly mistakenly) this would be a simpler change just to circumvent LinkedBlockingQueue
which was the main goal.
including interruption handling fix
Thanks @dapengzhang0 @carl-mastrangelo, have addressed the comments, PTAL |
LGTM |
@njhill Can you include your before and after JMH numbers for the commit? We typically include them when making performance optimizations. |
@carl-mastrangelo I thought I had observed a bigger difference in the non-direct case in other runs, when the system was noisier. I know it's not a huge delta but there's a couple more similar changes I have in mind which cumulatively add up to maybe ~15% (to be confirmed!) Before:
After:
This is with the following changes to default config:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@njhill merged, thanks! |
Thanks a lot for your PR @njhill |
The
ThreadlessExecutor
currently used for blocking calls usesLinkedBlockingQueue
which is relatively heavy both in terms of allocations and synchronization overhead (e.g. when compared toConcurrentLinkedQueue
). It accounts for ~10% of allocations and ~5% of allocated bytes per-call in theTransportBenchmark
when using in-process transport with stats and tracing disabled.Changing to use a
ConcurrentLinkedQueue
results in a ~5% speedup of that benchmark.