-
-
Notifications
You must be signed in to change notification settings - Fork 408
[SCTP] Associtation lock contention #363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCTP] Associtation lock contention #363
Conversation
Codecov ReportBase: 60.18% // Head: 59.99% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #363 +/- ##
==========================================
- Coverage 60.18% 59.99% -0.19%
==========================================
Files 503 504 +1
Lines 47983 48073 +90
Branches 12509 12527 +18
==========================================
- Hits 28879 28842 -37
- Misses 9913 10009 +96
- Partials 9191 9222 +31
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I really don't get how tokio schedules things.... I wanted to make the sending a third long running loop but that just results in the three loops occupying the same thread again... I would really like a cleaner solution than spawning a new task this frequently. Maybe someone with more tokio knowledge can help here. But for now this seems to be an improvement performance wise :) |
Converted to draft. I just tried this out at home and here the scheduling of tokio is even worse with this PR. I think I'll need to figure out a way to get these to cooperate better, this is apparently not a real solution :( |
…single write_loop instance
I don't really get why this test fails now. I can get it to succeed by limiting the semaphore to 1 permit, which is essentially the same as just awaiting the spawned task. Maybe the test setup doesn't drive these tasks as fast as one would wish and that's why the association ends up in an unexpected state? The failure isn't always on the same check so something non-deterministic is happening... Printline debugging to the rescue: it is apparently about the fact that this PR introduces a random reordering of the packets because (of course) the spawned tasks run concurrently/in parallel. This messes up this test. This poses the question... Should it mess up the test? I'll need to look into how reordering is supposed to affect the SCTP state machine and protocol parameters. If this is fine, we could just limit the semaphore to 1 permit in the test scenario and let it have up to X permits in real scenarios to maximize performance? |
//log::debug!("[{}] gather_outbound begin", name); | ||
let (raw_packets, mut ok) = { | ||
let (packets, continue_loop) = { | ||
let mut ai = association_internal.lock().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there has to be a way for us to remove this lock completely... we just need a bounded channel or lock-free queue to add and remove items. No need to lock entire association
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and it doesn't even need to be bounded, the amount of packets that can be in this channel would be naturally limited by the amount of bytes allowed to be in flight by the normal SCTP mechanisms.
I just didn't want to change to architecture too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
As discussed in #360 the lock on the internal association is very contended in high send-bandwidth situations. This PR achieves two things:
This in itself does not really increase the bandwidth but with improvements to the marshalling code itself gains can be head now, that were previously (at least partially) blocked by the lock contention.
It should also improve situations where both sides send a lot of data because then the write and read loops would both be very busy and fighting for the lock even more.