Significantly improved performance and fixed bug with ShellStream's Expect methods. #793
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Expect's performance degrades quickly as the size of the _incoming Queue grows. The amount of work that needs to be done by Regex grows with each byte added. Using a Regex pattern to detect a Bash prompt, I ran into an issue while Expecting the prompt whilst doing a large yum update on a Linux server. The resulting queue size jumped into the megabyte region and running a Regex match against the _incoming queue brought the process to a crawl. What would normally take about 3 minutes on a bash shell, was taking hours. After more than 2 hours, I cancelled the process and started debugging the issue with JetBrains' dotTrace. 85% of the process execution time was spent on the Regex Match.
I added a parallel _expect Queue and a _expectSize parameter to the ShellStream to allow a synchronous buffer to run along side of the _incoming queue, but with a limited capacity equivalent to _expectSize. As a default overload for CreateShellStream, if the parameter is omitted, it uses the number of columns as a default _expectSize. This allows for a running windows for Regex to check its Expect pattern and that windows remains small independent of the actual size of the _incoming queue. In my tests, this completely eliminated the slow down caused by the ever increasing size of the _incoming queue. It allows for performance on par with being directly connected as a human against the bash shell.
Whilst working on this, I noticed an additional issue that was simple to resolve given the now available parallel _expect queue.
Considering the default Encoding is UTF-8, there Regex Match Index does not necessarily correspond to the actual byte position within the UTF-8 string as some characters can be double byte encoded, which affects the Index returned by Match. ASCII does not support double byte encoding, so for the purpose of Expect, it makes more sense to match against an ASCII encoding of the string instead of a UTF-8 encoding since the _incoming Queue is obviously encoding agnostic.
Running a seperate Expect queue allows to Match against an ASCII version for byte position fidelity whilst conserving the proper encoding when returning the string from Expect.
Here is a dotNetFiddle that demonstrates the Match Index position issue with UTF-8 encoding (pulled from real-world result that I debugged and encoded as a byte array for dotNetFiddle): https://dotnetfiddle.net/JM80ea