Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

ConcurrentStack -> ConcurrentQueue #390

Merged
merged 1 commit into from
Nov 16, 2015
Merged

Conversation

benaadams
Copy link
Contributor

@NickCraver
Copy link
Contributor

+1 - ConcurrentQueue should be (and is measured to be) faster at higher mixed workloads and we have a 1:1 here.

Reference for the curious: http://blogs.msdn.com/b/pfxteam/archive/2010/04/26/9997562.aspx

@benaadams
Copy link
Contributor Author

ConcurrentBag might work here; it operates in an almost uncontended way, but its a memory pig and allocates lots of small objects; so would probably be bad in the GC stakes :(

@davidfowl
Copy link
Member

/cc @stephentoub

@benaadams
Copy link
Contributor Author

As an aside we switched from ConcurrentBag to ConcurrentQueue for our pools for the reduced GC; which outweighed the in principle perf boost of the implementation.

@lodejard
Copy link
Contributor

Isn't it better to provide the most-recently-returned block, rather than provide the least-recently-used block, when someone asks for memory? That's why it was a stack instead of a queue in the first place

@benaadams
Copy link
Contributor Author

The issue is caused by contention by both readers and writers on the _head with a ConcurrentStack so when it hits high throughput the Interlocked fails and it moves to burning cpu with PushCore and TryPopCore in cpu spins - though it will always make forward progress.

With ConcurrentQueue it enqueues to tail and dequeues from head which are different locations so writes only contend with writes and reads only contend with reads.

Also ConcurrentStack allocates individual Nodes for every push vs ConcurrentQueue using array chunks and reusing them; so allocating less.

@benaadams
Copy link
Contributor Author

Though something to measure (perhaps after the other changes to up throughput)

@NickCraver saw an increase from 931,289 RPS to 1,191,674 RPS when this was added on top of the other changes.

@rynowak
Copy link
Member

rynowak commented Nov 16, 2015

If contention on the memory pool is having this much of an impact on perf, it might be worth investigating if round-robin distributing to separate pools is an improvement.

@benaadams
Copy link
Contributor Author

@rynowak I push them out so each KestrelThread has their own pool in this PR https://github.com/aspnet/KestrelHttpServer/pull/343/files#diff-2ee816fea70408fbc7f21874eccb9211R35 also so they can be disposed; but round-robining could be interesting.

@lodejard
Copy link
Contributor

Okay, reducing push/pull contention sounds like a reasonable cause for the change.

Doesn't each thread already have it's own pool2?

@lodejard
Copy link
Contributor

Well - each uv loop more than each thread. The fact that some of the work is done on worker threads means the pool isn't truly isolated to the loop

@benaadams
Copy link
Contributor Author

Doesn't each thread already have it's own pool2?

PRs not been merged yet ;-)

davidfowl added a commit that referenced this pull request Nov 16, 2015
ConcurrentStack -> ConcurrentQueue
@davidfowl davidfowl merged commit 5774cf9 into aspnet:dev Nov 16, 2015
@benaadams benaadams deleted the stack2queue branch November 16, 2015 23:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants