Resetable CancellationTokenSource registrations #4694

benaadams · 2015-11-19T23:49:35Z

CancellationTokenSource is quite a heavyweight object and its not normally cancelled; however it can't be pooled or reused because its registrations cannot be cleared.

Its composed of the following objects:

ManualResetEvent m_kernelEvent
SparselyPopulatedArray<CancellationCallbackInfo>[] m_registeredCallbacksLists
int m_state
int m_threadIDExecutingCallbacks
bool m_disposed
CancellationTokenRegistration [] m_linkingRegistrations
CancellationCallbackInfo m_executingCallback
Timer m_timer

A ClearRegistrations() or Reset() function would be very helpful.

The text was updated successfully, but these errors were encountered:

davidfowl · 2015-11-20T02:56:49Z

/cc @stephentoub

stephentoub · 2015-11-20T03:01:49Z

A ClearRegistrations() or Reset() function would be very helpful.

One of the key aspects of CTS / CancellationToken is that it's one-directional, only ever transitioning from non-canceled to canceled, never the reverse. Lots of code depends on this, otherwise it's extremely prone to race conditions. It also seems particularly dangerous to "clear registrations"... if someone's still registered, why is it safe to ignore those registrations? If you know it's safe, then you're also typically in a position to unregister because you were the one to register, at which point you can choose to pool/reuse the CTS if it's not yet been canceled.

If you have suggestions for making the implementation more lightweight under the existing surface area and wanted to submit PRs for that, we'd be very happy to review. I don't believe we should add the proposed surface area, however.

benaadams · 2015-11-20T03:54:09Z

The use case wouldn't need to undo the cancel; but remove the registrations, i.e. it is still a one way cancel.

The token is passed on to external code so the owner of the CancellationSource doesn't have access to the CancellationTokenRegistration objects.

Use case:

Connection has a CancellationSource; handles many Requests which invoke user request handling code; when the connection is disconnected or the request is aborted the CancellationSource is triggered to cancel anything registered during the request.

However, if the connection hasn't been terminated; at the end of the request all the request and user code registrations need to be dumped as they are now out of scope and the CancellationSource prepared for the next request. The current way to do this is to dispose the CancellationSource and create another one.

In Kestrel's Frame.cs the Reset function disposes the CancellationToken and it is then recreated on the next loop ready for registrations.

The issue with this is that the CancellationSource is one of the highest allocations that happens in the loop.

So am looking for a lower GC pressure way of being able use a CancellationToken for a longer scoped object (Connection); but with a resetting scope for the registrations (Requests).

stephentoub · 2015-11-20T03:58:34Z

The issue with this is that the CancellationSource is one of the highest allocations that happens in the loop.

How much of that is the CTS itself, and how much are the allocations associated with the registrations?

at the end of the request all the request and user code registrations need to be dumped as they are now out of scope and the CancellationSource prepared for the next request.

At that point, why is there anything still registered with the CTS? Doesn't that imply someone isn't disposing of their CancellationTokenRegistration?

benaadams · 2015-11-20T04:04:25Z

How much of that is the CTS itself, and how much are the allocations associated with the registrations?

In the plaintext techEmpower performance benchmark; its all the CTS as there are no registrations.

At that point, why is there anything still registered with the CTS?

Its library code; can't enforce the user of the library not to make mistakes; and try to gracefully handle them rather than continuously build up references for over every request.

However, would you recommendation be to keep using the same CTS and assume that the registrations are correctly disposed?

stephentoub · 2015-11-20T04:10:45Z

would you recommendation be to keep using the same CTS and assume that the registrations are correctly disposed?

No, I see what you're saying.

At the same time, I'm struggling with the notion of simply dropping registrations. You've given out a CancellationToken, and code has registered with that CancellationToken. At some point the thing that provided the CancellationToken says "that's it, I'm done"... if anything is still listening, e.g. someone kicked off a cancelable timer or had some pending database request that can now be canceled and so on, shouldn't the CTS be Cancel'd rather than dropped? And once it's canceled, it's transitioned from non-canceled to canceled, and it's no longer reusable.

benaadams · 2015-11-20T04:18:54Z

Because its not cancelled, it completed. So the CancellationToken is for an out of band situation where it needs to be cancelled mid processing.

I think the standard approach would be for the connection to have its own CancellationTokenSource which is long lived and then use CreateLinkedTokenSource for each request with a new CancellationTokenSource which it can then dispose; but that's just frightening in number of allocations (e.g. when doing 1M+ requests per second)

stephentoub · 2015-11-20T04:26:18Z

Because its not cancelled, it completed. So the CancellationToken is for an out of band situation where it needs to be cancelled mid processing.

What's the mechanism by which work I've kicked off gets canceled when the request completes then? Or the assumption is "just don't do that"? This is obviously up to will of whoever is handling out the CT, but to me, it makes perfect sense to say "I'm completing now, my job is done, if anyone out there related to my work is still going, you can cancel now because I'm outta here." So I'm not quite on board with the argument of "because it's not canceled, it completed".

With regards to clearing registrations, let's hypothetically say we decided that was a good idea and it was something we wanted to do. CancellationTokens are a thin wrapper for a CancellationTokenSource, with their only field being a reference back to the CTS. How do you propose that be handled? Someone who was given a CT and polls it will still be referencing the original CTS, which might now be used to represent a completely unrelated piece of work. We could add an Int64 field to the CTS that represents a version number, add an Int64 field to a CT that represents a version number, and say the connection between the two is only valid if those two numbers match, but now both the struct and the class are 8 bytes bigger, and every operation on the CT needs to do additional work to validate that its source is still valid. Did you have something else in mind?

benaadams · 2015-11-20T05:07:35Z

What's the mechanism by which work I've kicked off gets canceled when the request completes then?

Its a serial pipeline so the request only completes when the work is done; however if the network connection is dropped, so there is nothing to receive the result produced at the end of the pipeline; then the pipeline needs to be stopped (if someone chooses to register for cancellation; or use the cancellation token in someway - which they may not do).

let's hypothetically say ...

Sounds quite good 👍 An extra 8 bytes for the lifetime of the connection is nothing compared to the 72 bytes per request, which for 4k requests is 288000 bytes (below)

For 2M requests a second its 144 MB per second or 8.64 GB a minute that needs to be GC'd just for the CancellationTokenSource. The push for cutting memory is that GC is currently the main activity of the server (shown below); and that's with a lot of pooling already going on.

Could have different flavours of CancellationTokenSource, ones that don't have the timer for example; or even a CancellationTokenSourceSlim that doesn't have much but has protected or public properties so it could be implemented in different ways via derived classes.

Though it would need a function that creates the CancellationToken's since its constructor is currently internal and only takes a CancellationTokenSource?

stephentoub · 2015-11-20T05:09:31Z

As noted earlier, if the concern is about the size of CTS, I'd be happy to look at proposals for making the implementation better. For example, if we really believe that the most common case in general (and not just on one specific benchmark) is no registrations, then we could consider putting all of the CTS state into a lazy-allocating subobject. Or maybe there are ways of eliminating some fields or consolidating them, etc.

stephentoub · 2015-11-20T05:13:38Z

An extra 8 bytes for the lifetime of the connection is nothing compared to the 72 bytes per request, which for 4k requests is 288000 bytes (below)

You're focused on one very specific benchmark. There are many other use cases. And increasing the size of this object by 10% should not be taken lightly when the concern to begin with is its size. I'm also not entirely sure my off-the-cuff comment would even work, e.g.with the synchronization necessary to prevent such a token from registering with its old CTS.

Its a serial pipeline so the request only completes when the work is done

How is the serial nature enforced? You're saying it's not possible in ASP.NET to schedule concurrent operations? Or you're talking about this specific benchmark?

benaadams · 2015-11-20T05:29:33Z

The CTS isn't particularly big in the scheme of things; its the multiplication of its size that the main problem; because currently the only safe approach it to ditch it each time. Even a single new object (12 bytes?) done on every request is 24 MB/s of garbage.

You're saying is not possible in ASP.NET to schedule concurrent operations?

Obviously its dotnet so if you wanted to spin up a background thread or split of multiple tasks there's nothing stopping you; so you can go quite happily go "off piste"; have background activities pruning a cache, posting stats etc

For the request pipeline specifically, its a pluggable where each component is called, then it calls the next item in the pipeline; so it has access to the before and after state. I believe the expectation is that outside that context the component shouldn't be trying to mess with the request or holding onto references to it.

You could kick something off on the way in and then finish it on the way out; or have multiple components at different parts of the pipeline communicate with each other; by altering the request; for example authorisation and authentication do this. But essentially when the request has been sent out across the network that's when its complete and nothing further can be done with it.

stephentoub · 2015-11-20T05:42:06Z

The CTS isn't particularly big in the scheme of things; its the multiplication of its size that the main problem;

How many objects / how much memory is currently allocated per request? What percentage is the CTS? I of course completely understand the goal of minimizing allocations, but the numbers you're citing are difficult to reason about without context.

And do you have a proposal for how to actually make it all work, assuming we did decide to expose such a reuse operation? It would need to be bulletproof with regards to the previously handed out token not being able to poll, register with, etc. the reused CTS, and it would need to not add to the size of the CTS in a meaningful way for scenarios that do create lots of instances in cases where the CTS object is one of the larger contributors to allocations, such as creating a cancelable task.

benaadams · 2015-11-20T06:30:04Z

How many objects / how much memory is currently allocated per request? What percentage is the CTS?

The in "aspnet/dev" allocations as a ratio look like this (well higher as its currently using
_requestAbortCts = CancellationTokenSource.CreateLinkedTokenSource(_disconnectCts.Token);)

There are PRs hopefully to take the last two lines to more or less zero; at which point it becomes 36% of this part of pipeline. The other allocator is the input header's value strings (the names/keys don't allocate) - but I believe that's also being looked at.

The streams I'm not sure about; they mostly exist to support the stopped/aborted state for the current request so you can't continue writing to it after; and for replacement, filters, default, encyption etc Though they could probably be combined.

benaadams · 2015-11-20T06:42:03Z

And do you have a proposal for how to actually make it all work

Not entirely; though a derived CTS might be safer; or multiple derived types, timed, single use, etc and I like your versioning idea; but that would probably change contracts; and I take your point.

where the CTS object is one of the larger contributors to allocations, such as creating a cancelable task.

Needs more thought; would be good if that could also be reduced; obviously increasing the load here would be very bad; as once you've tried async you can't go back... but there is a lot going on in CTS; and though I use them a lot; other than plumming them in I don't think I've ever knowingly cancelled one.

stephentoub · 2015-11-20T17:03:53Z

Thanks for pushing on this, @benaadams. It's impressive to see how much you and others have been able to drive down those allocations, and it's good to understand the costs here in perspective.

I will take a look at CTS today to see if any easy wins jump out at me.

benaadams · 2015-11-20T17:17:14Z

I've been wondering on this and the Registrations should be disposed; so if the common use case was to pass them though to other library code; for example a sql exec task; then that library should be disposing of the registrations; and hopefully if the user was actually registering things - they should know enough to dispose them - at which point the same CTS could be used for the whole connection?

Though I don't know how the aspnet team would feel about it as it is potential bleed though and a memory leak. On the other hand it is a case of not disposing of your disposables....

/cc @davidfowl @halter73 @Tratcher

Tratcher · 2015-11-20T17:47:31Z

Since this will be exposed to arbitrary 3rd party components we can't require them to always un-register, it would be un-enforceable and lead to memory leaks. Really it's a per-request token that you are trying to re-use for the next request on the same connection because you know it didn't fire yet and everyone should be done with it.

stephentoub · 2015-11-20T21:01:15Z

@benaadams, I just reread what you wrote about sizes, and noticed that you said "of this part of pipeline". I don't know enough about the system to understand what this part of the pipeline is and whether that's actually used on its own, or whether it's never used on its own and this is just a piece of a larger pie that's always used. Can you clarify? How does this impact the entire request/response for example, assuming that's the pipeline being discussed?

benaadams · 2015-11-20T21:13:48Z

@stephentoub its most of all; was just expressing it wasn't everything so there are extra allocations in the request input for example with header strings which are the highest allocators. I did an experiment in this PR aspnet/KestrelHttpServer#411 which also gives the breakdown of the top 5 types in current aspnet/dev

benaadams · 2015-11-20T21:16:35Z

Obviously when it comes to user code all bets are off as that will undoubtedly be the highest allocator; but the load the library presents is the maximal possible optimization level they can get to - so its about it having the lightest load possible.

benaadams · 2015-11-21T04:44:10Z

@stephentoub looking at the whole app/pipeline after a few of the non-conflicting PRs have been merged across 4k requests

Items 1,2,3 have stuff in the works, taking this CTS to the second most allocated type; from its current 5th place.

benaadams · 2015-11-21T04:52:45Z

And the previous second of code for allocations now looks like for 4k requests

stephentoub · 2015-11-21T13:41:06Z

How about a different approach, one that lazily-allocates the CTS only when it's actually needed:
stephentoub/KestrelHttpServer@c7d7f0e

I don't know what perf test you're running, so I've not evaluated the perf impact of this and whether it provides the desired help or not. If it does, I can submit this as a PR once your changes go in, as I've branched off of your PR at aspnet/KestrelHttpServer#408.

benaadams · 2015-11-21T14:23:37Z

Heh, that's a really good idea! Can't see the wood for the trees...

Merged it into the current PR, thank you!

@stephentoub Should this issue be closed?

stephentoub · 2015-11-21T14:57:00Z

Glad it was helpful. Yes, let's close this. We can still explore ways to reduce the size of a CTS further (it's now 64 bytes on 64-bit), but I don't see a good way to safely enable the kind of reuse you're describing, and with my commit, it sounds like the targeted use case no longer exists, or at least is significantly diminished.

benaadams · 2015-11-21T15:32:09Z

Re-run the tests and this completely clears up the allocations; thank you again.

benaadams · 2015-11-21T15:57:23Z

And I'm getting good sustained throughput

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.50ms   20.49ms 405.61ms   93.01%
    Req/Sec    44.57k     8.62k  101.79k    93.77%
  170312239 requests in 2.00m, 20.94GB read
Requests/sec: 1418228.35
Transfer/sec:    178.53MB

1.4M requests per second

roji · 2020-02-13T13:06:33Z

Does it make sense to revisit this (along the lines of ManualResetValueTaskSourceCore)? My specific use case is waiting on a Channel with a timeout, which only seems to be possible via a cancellation token, and I'd like to avoid incurring an allocation each time. The timeout must be applied on every read so I can't see laziness being a solution here (or have I miss some other option?)

stephentoub · 2020-02-13T14:53:22Z

I'd like to avoid incurring an allocation each time

Can you not just reset the timeout and reuse the same CTS as long as it hasn't timed out / been canceled? An extra allocation when things actually time out should be much less impactful.

roji · 2020-02-13T16:11:47Z

@stephentoub yeah, that's what I'm doing - but the timeout here is possibly a frequent event, meaning the allocation would still be frequent. Am also wondering more in general, now that we allow reset for ValueTask, if there's a fundamental reason not to allow the same for cancellation tokens (via a different advanced type, with the understanding that it's the user's responsibility to make sure everything is properly managed).

stephentoub · 2020-02-13T16:12:37Z

if there's a fundamental reason not to allow the same for cancellation tokens (via a different advanced type, with the understanding that it's the user's responsibility to make sure everything is properly managed

You mean introducing an entirely new type that's not CancellationToken?

roji · 2020-02-13T16:14:02Z

I was imagining a new type that's not CancellationTokenSource, but that would be able to manage CancellationToken - but I may have a bad mental model of everything here. Otherwise an advanced, appropriately-named method on CancellationTokenSource?

stephentoub · 2020-02-13T16:27:21Z

CancellationToken has very clear semantics: once IsCancellationRequested is true, it'll never be false. And implementations rely on that. That was not the case with ValueTask<T>. We introduced IValueTaskSource<T> so soon after the original type that we felt it was acceptable to retcon the behavioral changes... and even with that short gap we've seen it result in fallout we're still dealing with. CancellationToken has been around for way, way longer.

roji · 2020-02-13T16:31:56Z

Thanks for explaining. A new type that's not CancellationToken indeed seems like a non-starter.

davidfowl · 2020-02-13T16:37:16Z

😢 This makes me sad.

Since resetting a timeout on a CancellationTokenSource is a pretty common thing to do and that'll probably never be supported out of the box (dotnet/runtime#4694), put the uglyness of resetting the timeout in one place so that we don't have to repeat it all the time.

…3215) Add a wrapper for CancellationTokenSource with a resettable timeout Since resetting a timeout on a CancellationTokenSource is a pretty common thing to do and that'll probably never be supported out of the box (dotnet/runtime#4694), put the uglyness of resetting the timeout in one place so that we don't have to repeat it all the time.

stephentoub closed this as completed Nov 21, 2015

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

stephentoub mentioned this issue Aug 20, 2020

A cheaper linked cancellation token source? #40670

Closed

Brar mentioned this issue Oct 5, 2020

Add a wrapper for CancellationTokenSource with a resettable timeout npgsql/npgsql#3215

Merged

ghost locked as resolved and limited conversation to collaborators Jan 4, 2021

Resetable CancellationTokenSource registrations #4694

Resetable CancellationTokenSource registrations #4694

Comments

benaadams commented Nov 19, 2015

davidfowl commented Nov 20, 2015

Uh oh!

stephentoub commented Nov 20, 2015

Uh oh!

benaadams commented Nov 20, 2015

Uh oh!

stephentoub commented Nov 20, 2015

Uh oh!

benaadams commented Nov 20, 2015

Uh oh!

stephentoub commented Nov 20, 2015

Uh oh!

benaadams commented Nov 20, 2015

Uh oh!

stephentoub commented Nov 20, 2015

Uh oh!

benaadams commented Nov 20, 2015

Uh oh!

stephentoub commented Nov 20, 2015

Uh oh!

stephentoub commented Nov 20, 2015

Uh oh!

benaadams commented Nov 20, 2015

Uh oh!

stephentoub commented Nov 20, 2015

Uh oh!

benaadams commented Nov 20, 2015

Uh oh!

benaadams commented Nov 20, 2015

Uh oh!

stephentoub commented Nov 20, 2015

Uh oh!

benaadams commented Nov 20, 2015

Uh oh!

Tratcher commented Nov 20, 2015

Uh oh!

stephentoub commented Nov 20, 2015

Uh oh!

benaadams commented Nov 20, 2015

Uh oh!

benaadams commented Nov 20, 2015

Uh oh!

benaadams commented Nov 21, 2015

Uh oh!

benaadams commented Nov 21, 2015

Uh oh!

stephentoub commented Nov 21, 2015

Uh oh!

benaadams commented Nov 21, 2015

Uh oh!

stephentoub commented Nov 21, 2015

Uh oh!

benaadams commented Nov 21, 2015

Uh oh!

benaadams commented Nov 21, 2015

Uh oh!

roji commented Feb 13, 2020

Uh oh!

stephentoub commented Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roji commented Feb 13, 2020

Uh oh!

stephentoub commented Feb 13, 2020

Uh oh!

roji commented Feb 13, 2020

Uh oh!

stephentoub commented Feb 13, 2020

Uh oh!

roji commented Feb 13, 2020

Uh oh!

davidfowl commented Feb 13, 2020

Uh oh!

stephentoub commented Feb 13, 2020 •

edited

Loading