Skip to content

Significantly improved performance and fixed bug with ShellStream's Expect methods. #793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jscarle
Copy link
Contributor

@jscarle jscarle commented Mar 6, 2021

Expect's performance degrades quickly as the size of the _incoming Queue grows. The amount of work that needs to be done by Regex grows with each byte added. Using a Regex pattern to detect a Bash prompt, I ran into an issue while Expecting the prompt whilst doing a large yum update on a Linux server. The resulting queue size jumped into the megabyte region and running a Regex match against the _incoming queue brought the process to a crawl. What would normally take about 3 minutes on a bash shell, was taking hours. After more than 2 hours, I cancelled the process and started debugging the issue with JetBrains' dotTrace. 85% of the process execution time was spent on the Regex Match.

I added a parallel _expect Queue and a _expectSize parameter to the ShellStream to allow a synchronous buffer to run along side of the _incoming queue, but with a limited capacity equivalent to _expectSize. As a default overload for CreateShellStream, if the parameter is omitted, it uses the number of columns as a default _expectSize. This allows for a running windows for Regex to check its Expect pattern and that windows remains small independent of the actual size of the _incoming queue. In my tests, this completely eliminated the slow down caused by the ever increasing size of the _incoming queue. It allows for performance on par with being directly connected as a human against the bash shell.

Whilst working on this, I noticed an additional issue that was simple to resolve given the now available parallel _expect queue.

Considering the default Encoding is UTF-8, there Regex Match Index does not necessarily correspond to the actual byte position within the UTF-8 string as some characters can be double byte encoded, which affects the Index returned by Match. ASCII does not support double byte encoding, so for the purpose of Expect, it makes more sense to match against an ASCII encoding of the string instead of a UTF-8 encoding since the _incoming Queue is obviously encoding agnostic.

Running a seperate Expect queue allows to Match against an ASCII version for byte position fidelity whilst conserving the proper encoding when returning the string from Expect.

Here is a dotNetFiddle that demonstrates the Match Index position issue with UTF-8 encoding (pulled from real-world result that I debugged and encoded as a byte array for dotNetFiddle): https://dotnetfiddle.net/JM80ea

@jscarle jscarle changed the title Significantly improved performance and fixed bug. Significantly improved performance and fixed bug with ShellStream. Mar 6, 2021
@jscarle jscarle changed the title Significantly improved performance and fixed bug with ShellStream. Significantly improved performance and fixed bug with ShellStream's Expect methods. Mar 6, 2021
@jscarle
Copy link
Contributor Author

jscarle commented Mar 6, 2021

Seems that the unit tests that failed in appveyor were already broken when I forked the repository.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant