Skip to content

Significantly improve performance of ShellStream's Expect methods #1207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Feb 13, 2024

Conversation

jscarle
Copy link
Contributor

@jscarle jscarle commented Oct 13, 2023

I'm glad to see that the project is alive again! This is an updated PR based off the original PR #793 as the original issue still exists and it is a very serious performance issue.

Expect's performance degrades quickly as the size of the _incoming Queue grows. The amount of work that needs to be done by Regex grows with each byte added. Using a Regex pattern to detect a Bash prompt, I ran into an issue while Expecting the prompt whilst doing a large yum update on a Linux server. The resulting queue size jumped into the megabyte region and running a Regex match against the _incoming queue brought the process to a crawl. What would normally take about 3 minutes on a bash shell, was taking hours. After more than 2 hours, I cancelled the process and started debugging the issue with JetBrains' dotTrace. 85% of the process execution time was spent on the Regex Match.

I added a parallel _expect Queue and a _expectSize parameter to the ShellStream to allow a synchronous buffer to run along side of the _incoming queue, but with a limited capacity equivalent to _expectSize. As a default overload for CreateShellStream, if the parameter is omitted, it uses the number of columns as a default _expectSize. This allows for a running windows for Regex to check its Expect pattern and that windows remains small independent of the actual size of the _incoming queue. In my tests, this completely eliminated the slow down caused by the ever increasing size of the _incoming queue. It allows for performance on par with being directly connected as a human against the bash shell.

Whilst working on this, I noticed an additional issue that was simple to resolve given the now available parallel _expect queue.

Considering the default Encoding is UTF-8, there Regex Match Index does not necessarily correspond to the actual byte position within the UTF-8 string as some characters can be double byte encoded, which affects the Index returned by Match. ASCII does not support double byte encoding, so for the purpose of Expect, it makes more sense to match against an ASCII encoding of the string instead of a UTF-8 encoding since the _incoming Queue is obviously encoding agnostic.

Running a seperate Expect queue allows to Match against an ASCII version for byte position fidelity whilst conserving the proper encoding when returning the string from Expect.

Here is a dotNetFiddle that demonstrates the Match Index position issue with UTF-8 encoding (pulled from real-world result that I debugged and encoded as a byte array for dotNetFiddle): https://dotnetfiddle.net/JM80ea

@drieseng
Copy link
Member

@jscarle, I haven't had time to go through you changes, but can you create a separate PR for the bug fix? Make sure to add a unit or integration test for it too.

@jscarle jscarle reopened this Nov 1, 2023
@jscarle
Copy link
Contributor Author

jscarle commented Nov 1, 2023

@drieseng This is the "minimum" change needed to fix the issue. It requires running a parallel "expect queue" to lower the memory overhead of the shell stream.

Copy link
Collaborator

@WojciechNagorski WojciechNagorski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reporting this problem, because the problem exists.

I've checked this PR and I think there is a simple way to improve this.

@jscarle
Copy link
Contributor Author

jscarle commented Dec 11, 2023

No changes have been brought to this PR as I still believe that the only way to solve the performance issue with Expect without changing the API signature of the methods is to introduce a rolling buffer that runs in parallel with the incoming queue and to use that rolling buffer as the source of the expect verification.

@jscarle
Copy link
Contributor Author

jscarle commented Dec 22, 2023

@WojciechNagorski Can you review my comments please?

@WojciechNagorski
Copy link
Collaborator

Yes I can. I know, you are right there is the huge performance problem. However, I need to find more time to delve deeper into this.

@jscarle
Copy link
Contributor Author

jscarle commented Jan 19, 2024

@WojciechNagorski The changes I've proposed can be merged as is since they both a) fix the performance issue and b) do not change any of the public APIs. This would allow everyone to benefit from the huge performance gain this brings and at a later time, if you find a better approach when you have more time, then it could be refactored. At least, this would immediately solve a huge pain point.

@WojciechNagorski
Copy link
Collaborator

I'm not sure. I need to find time but you know I'm doing it in my free time.

@Rob-Hague
Copy link
Collaborator

Sorry @jscarle that this is arduous. I think it's partly explained by ShellStream having many preexisting problems (inefficient being just one) and little (useful) test coverage. That creates a high bar for the motivation to touch it (that's been my feeling anyway), and especially to increase the complexity and potentially change the behaviour without the testing situation changing.

I've opened #1313 to add some (failing) tests. I'll try and fix them separately, and then we can revisit this improvement with more confidence?

@jscarle
Copy link
Contributor Author

jscarle commented Feb 10, 2024

I deeply understand the challenges faced by this being an open source project to which you both volunteer your free time, and that this is a complex portion of the code to which everyone is hesitant to modify.

However, I would like to reiterate that the issue with Expect is so deep that this makes its completely unusable in any long running scenario as the time to process the buffer raises exponentially with its length.

In my real world usage, I used ShellStream to update Linux virtual machines. When the base image was fairly recent, running a yum -y update would take 5 minutes. When the image was a few months old, it would take up to an hour. After a year, the update would take up to 2 hours. I would later find out that running the same deployment steps, which I had automated, by hand would only take 20 minutes. That 1:40 difference was caused by Expect. That I how I found this issue. The code I submitted in this PR has been running in production for several years now, which is why I have such confidence in this fix.

The changes made in this PR were done with great care to solve only this issue, and nothing else. No refactoring or other improvements were done. They were also done in a way as to not do any changes to any of the public APIs, thus no breaking changes.

Internally all that is really happening is that a parallel buffer is run that is used only for the Expect. This buffer is fixed in size and rolls over to keep it concise. This eliminates all peformance issues of Expect whilst not changing any of the current behavior of ShellStream.

@Rob-Hague
Copy link
Collaborator

Understood. I can definitely believe ShellStream is the bottleneck in a process. And I agree with this direction in order to avoid exponential regex matching. It's good to know you've been running this for a while in production.

@WojciechNagorski
Copy link
Collaborator

This PR does not compile after merge with master.

@jscarle
Copy link
Contributor Author

jscarle commented Feb 11, 2024

Build has been fixed and all tests are passing.

@WojciechNagorski
Copy link
Collaborator

It looks better!
last thought:
The ExpectSize may be different depending on the regex which may change every time the Expect() method is called. The expectSize value suits me better as an optional parameter for the Expect() method.

What do you think? @jscarle @Rob-Hague

@jscarle
Copy link
Contributor Author

jscarle commented Feb 11, 2024

It looks better! last thought: The ExpectSize may be different depending on the regex which may change every time the Expect() method is called. The expectSize value suits me better as an optional parameter for the Expect() method.

What do you think? @jscarle @Rob-Hague

The expectSize has nothing to do with the actual Expect. It's a rolling buffer that's parallel.

@jscarle
Copy link
Contributor Author

jscarle commented Feb 11, 2024

I made the expectSize to be 2 x the buffer size by default, and then I added a LargeExpect test to make sure this works with large 1k type expect strings.

@jscarle
Copy link
Contributor Author

jscarle commented Feb 11, 2024

@WojciechNagorski I understand your desire to want to adjust the expectSize dynamically, but it would be very difficult to do and likely be quite instable and error prone.

The performance issue with Expect is that it runs against an ever growing _incoming queue. Running a parrallel _incoming queue (which is named _expect), we can control the subset of the _incoming queue against which Expect runs.

Setting to a default of 2 x bufferSize (which is 1024 by default), means we always evaluate Expect against the last 2048 bytes of the _incoming queue.

@jscarle jscarle force-pushed the develop branch 2 times, most recently from 92dd143 to e266ac5 Compare February 12, 2024 00:31
@jscarle
Copy link
Contributor Author

jscarle commented Feb 12, 2024

@WojciechNagorski @Rob-Hague

I added guard clauses to the constructor of ShellStream to prevent values less than 1 for the bufferSize and expectSize. There was a connection test that set the bufferSize to 0 which caused the test to hang indefinitely. We may want to consider adding guard clauses for the other values, but this could be done later.

Since I've addressed the encoding issue and large Expect, I believe this is now ready to merge.

@jscarle jscarle changed the title Significantly improved performance of ShellStream's Expect methods Significantly improve performance of ShellStream's Expect methods Feb 12, 2024
@WojciechNagorski
Copy link
Collaborator

@Rob-Hague I'm waiting for your opinion.

@Rob-Hague
Copy link
Collaborator

Thanks. My concern was that we are going from processing data repeatedly to potentially not processing some data at all, but I understand that's probably an edge case, so I don't think we have to worry about it for now.

I would also have a slight preference for a default parameter on the Expect(Regex) methods. I think it would give more flexibility in the implementation, if one just specifies the length of the match they expect (with the Expect(string) we know it already) and we can decide how much data we need to look at from there.

But I see how that could be more difficult in the current implementation. On the other hand, it should be easier with some changes I have locally. And on the other other hand, I don't want to hold up this PR.

So in summary I think it's fine as is.

@jscarle
Copy link
Contributor Author

jscarle commented Feb 12, 2024

I can work on a second iteration after this PR is merged in order to explore setting the expectSize at the method level. At least with this PR, we can get the fix out to everyone now and we can improve the API later.


for (var i = 0; i < charCount && _incoming.Count > 0; i++)
// Remove processed items from the queue
for (var i = 0; i < returnLength && _incoming.Count > 0; i++)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually going to remove what it should from both _incoming and _expect?

If _incoming looks like: aaaaabbbbbcccccZ
and _expect (and so matchText) looks like: cccZ
and we are expecting Z
then returnText == "cccZ" and returnLength == 4

Then we are going to dequeue aaaa from _incoming and nothing from _expect (because _incoming.Count > _expect.Count + 4). So we end up with:

_incoming looks like: abbbbbcccccZ
and _expect still looks like: cccZ

Have I got that right? Is that expected? (it feels wrong)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add a test to try to replicate this and make sure its accounted for. I have a feeling that the dequeueing may be slightly off since for things to happen as you've mentioned, data would have to accumulate and be read in different ways within the same workflow.

@WojciechNagorski
Copy link
Collaborator

I can work on a second iteration after this PR is merged in order to explore setting the expectSize at the method level. At least with this PR, we can get the fix out to everyone now and we can improve the API later.

We cannot add parameters to the public API and then remove them in the next release.

When passing a parameter to the Expect method, the most difficult thing will be to efficiently prepare a subset of bytes from _incoming. Queue does not support AsSpan() I think, However, it's worth checking it now before merging. But maybe at this point we could create _expect and copy what we need there.

@jscarle
Copy link
Contributor Author

jscarle commented Feb 13, 2024

I can work on a second iteration after this PR is merged in order to explore setting the expectSize at the method level. At least with this PR, we can get the fix out to everyone now and we can improve the API later.

We cannot add parameters to the public API and then remove them in the next release.

We could remove the public overloads that allow expectSize to be set since it now defaults to 2 x bufferSize anyway, it may be moot.

@WojciechNagorski
Copy link
Collaborator

Sorry, you're right.

@WojciechNagorski
Copy link
Collaborator

This issue has been fixed in the 2024.0.0 version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants