Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

Simplify SocketInput, remove lock, only use pooled blocks #525

Merged
merged 2 commits into from
Jan 8, 2016

Conversation

benaadams
Copy link
Contributor

From #519

Partial #516

Complete();
}

public void AbortAwaiting()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this back after ConsumingComplete so there's less impact on the diff.

@benaadams
Copy link
Contributor Author

@CesarBS yes the idea came from something @halter73 pointed out in the SocketOutput

IncomingStart+IncomingComplete is only called by the libuv thread; IncomingData is only called when its a wrapped stream (so libuv won't be calling the other two methods); so the Incoming methods are synchronized with each other already as there is only one thing using them.

Likewise the Consuming methods have only one thing using them (single producer, single consumer).

In the cross over between the two:

Consuming only modifies _head
Incoming only modifies _tail+_pinned except on the very first read when _head == null, when it sets _head to point at the start of the data.

If ConsumingStart read null for _head then it doesn't change _head in ConsumingComplete

if (!consumed.IsDefault)
{
    returnStart = _head;
    returnEnd = consumed.Block;
    _head = consumed.Block;
    _head.Start = consumed.Index;
}

So there isn't really any cross over; with the data so the lock isn't needed.

The timing between the Incoming and Consuming is controlled by the Interlocked.CompareExchange(ref _awaitableState... and ManualResetEventSlim _manualResetEvent as before.

@benaadams
Copy link
Contributor Author

Rebased so can make another PR based on it

@halter73
Copy link
Member

👍 It does seem very similar to SocketOutput (which shouldn't be to surprising), and I think you're right that the lock was unnecessary.

I wouldn't mind a second opinion from @lodejard though.

@cesarblum
Copy link
Contributor

I love this PR because I think it'll make my life easier when I get to work on #304 😁

@benaadams
Copy link
Contributor Author

Still can't workout why the GetDateHeaderValue_ReturnsUpdatedValueAfterIdle test periodically fails

@halter73
Copy link
Member

halter73 commented Jan 7, 2016

@lodejard is worried about removing the locks, but I think it's safe (assuming there aren't concurrent reads) and there is a nice perf gain. Here are my numbers:

dev:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     8.18ms   35.18ms 494.60ms   93.79%
    Req/Sec    36.07k     2.78k   69.91k    87.91%
  11552580 requests in 10.10s, 1.42GB read
Requests/sec: 1143851.17
Transfer/sec:    143.99MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.70ms   18.91ms 342.23ms   94.38%
    Req/Sec    35.99k     2.53k   63.87k    90.36%
  11525358 requests in 10.10s, 1.42GB read
  Socket errors: connect 0, read 0, write 529, timeout 0
Requests/sec: 1141167.60
Transfer/sec:    143.66MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.81ms    5.60ms 177.49ms   95.18%
    Req/Sec    35.88k     1.86k   53.53k    86.22%
  11512022 requests in 10.10s, 1.42GB read
Requests/sec: 1139994.68
Transfer/sec:    143.51MB

benaadams/socket-input-merged:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.70ms    6.02ms 386.69ms   93.37%
    Req/Sec    36.70k     1.48k   49.61k    78.44%
  11790897 requests in 10.10s, 1.45GB read
Requests/sec: 1167464.67
Transfer/sec:    146.97MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.46ms    3.77ms 265.56ms   94.96%
    Req/Sec    36.45k     2.12k   57.27k    83.90%
  11695314 requests in 10.10s, 1.44GB read
Requests/sec: 1158028.82
Transfer/sec:    145.78MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.44ms    7.98ms 294.07ms   94.29%
    Req/Sec    36.71k     2.20k   63.16k    84.17%
  11771692 requests in 10.10s, 1.45GB read
Requests/sec: 1165591.15
Transfer/sec:    146.73MB

benaadams/socket-input-merged-lock (all your changes with the _sync lock readded):

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.55ms   14.87ms 277.05ms   93.21%
    Req/Sec    35.74k     2.29k   57.51k    85.31%
  11454930 requests in 10.10s, 1.41GB read
Requests/sec: 1134174.28
Transfer/sec:    142.78MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.13ms   12.74ms 273.82ms   95.53%
    Req/Sec    35.79k     2.02k   55.83k    82.63%
  11484589 requests in 10.10s, 1.41GB read
Requests/sec: 1137144.99
Transfer/sec:    143.15MB


Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.92ms   12.20ms 266.46ms   96.73%
    Req/Sec    35.75k     2.26k   63.15k    87.27%
  11458948 requests in 10.10s, 1.41GB read
Requests/sec: 1134531.78
Transfer/sec:    142.82MB

shalter/socket-input-safe-consume:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.03ms    6.70ms 262.35ms   96.15%
    Req/Sec    36.27k     2.13k   52.20k    87.72%
  11642067 requests in 10.10s, 1.43GB read
Requests/sec: 1152997.37
Transfer/sec:    145.15MB


Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.58ms   19.06ms 253.21ms   97.02%
    Req/Sec    36.47k     2.23k   60.40k    90.38%
  11691776 requests in 10.10s, 1.44GB read
Requests/sec: 1157669.21
Transfer/sec:    145.73MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.89ms    7.11ms 158.40ms   97.67%
    Req/Sec    36.61k     2.17k   64.40k    91.24%
  11730125 requests in 10.10s, 1.44GB read
Requests/sec: 1161390.24
Transfer/sec:    146.20MB

I added a small change on top of yours to throw for concurrent non-async reads. This should actually catch potential bugs that weren't caught before.

And, as shown above, it doesn't seem too detrimental to perf. At least it's not blocking the libuv thread.

@benaadams What do you think?

@benaadams
Copy link
Contributor Author

Your change is good and is better than the lock, as it works across the ConsumingStart/Complete and will surface bugs in user code - which wasn't covered by the lock.

The Interlocked also acts a memory barrier at cpu level (as does lock). It could be argued that's its overly defensive as nowhere is it ever suggested an instance socket read is a threadsafe operation; but its potentially a public port; RTM is approaching - safety first, and more importantly I clearly managed to do it at some point in our application with the way we were using websockets ;)

The strong synchronisation is still covered by it having interlocked awaitable states and perhaps overly covered with the addition of the ManualResetEventSlim (async all the things!); however I never saw it show up as an item of interest in perf testing.

I'd go with your addition; revisit post RTM if everything gets so fast it starts showing up as an hot spot?

@benaadams
Copy link
Contributor Author

LGTM (with 7e9dd9e) looking to see if it will trigger exceptions on the issues I experienced earlier in the application lifetime

@benaadams
Copy link
Contributor Author

Though maybe some tests for the double read ;-)

@benaadams
Copy link
Contributor Author

In testing I don't trigger overlapped consumes; but have managed to get a confirmation of previous related fix

Microsoft.AspNet.Server.Kestrel[0]
Connection shutdown abnormally
System.IO.IOException: Concurrent reads are not supported. ---> System.InvalidOperationException: Concurrent reads are not supported.
--- End of inner exception stack trace ---
at Microsoft.AspNet.Server.Kestrel.Http.SocketInput.GetResult()
at Microsoft.AspNet.Server.Kestrel.Http.Frame`1.d__3.MoveNext()

(e.g. it just closes the connection, doesn't get more upset)

@benaadams
Copy link
Contributor Author

Also found where it was previously going wrong; in somewhere in my or middlewere handling of websockets close - which is a weird area anyway.

@halter73 halter73 merged commit cf77efc into aspnet:dev Jan 8, 2016
@halter73 halter73 mentioned this pull request Jan 9, 2016
@benaadams benaadams deleted the socket-input branch May 10, 2016 02:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants