Skip to content

async parser handler support#3222

Merged
jerch merged 30 commits intoxtermjs:masterfrom
jerch:async_handlers
Mar 17, 2021
Merged

async parser handler support#3222
jerch merged 30 commits intoxtermjs:masterfrom
jerch:async_handlers

Conversation

@jerch
Copy link
Copy Markdown
Member

@jerch jerch commented Jan 20, 2021

Not being able to use async functions in parser handlers is an unfortunate limitation of the current parser implementation. This PR tries to come up with a solution to enable async support while keeping the parser performant.

TODO:

  • async parser support for:
    • CSI handler
      • basic impl
      • test cases
    • ESC handler
      • basic impl
      • test cases
    • OSC handler
      • basic impl
      • test cases
    • DCS handler
      • basic impl
      • test cases
  • async/sync mixed test cases on inputhandler
  • high level API tests with async handlers
  • deprecate/remove writeSync (refactor core tests) - needs note in release doc
  • proper reset handling on parser
  • simplify/cleanup handler interface mess of the parser (see cleanup parser handler interface #3223)
  • cleanup async codebase
  • update parser hooks docs --> created update parser hooks description regarding async handlers xtermjs.org#136

Shall fix #3218.

@jerch jerch self-assigned this Jan 20, 2021
@jerch
Copy link
Copy Markdown
Member Author

jerch commented Jan 20, 2021

The early commit can be tested in the browser by marking SGR as async - simply replace:

public charAttributes(params: IParams): void {

with

  public async charAttributes(params: IParams): Promise<void> {

(Note that our local tests will not run anymore with that, as they heavily rely on writeSync.) Fixed with 4a1bdad.

@jerch
Copy link
Copy Markdown
Member Author

jerch commented Jan 22, 2021

The last commit enables to test async handlers in our inputhandler transparently, which helps to spot regressions while messing around with random async handlers and testing the perf behavior.

The next thingy to reshape it is writeSync, after that the core tests should also work with async handlers.

@jerch
Copy link
Copy Markdown
Member Author

jerch commented Jan 22, 2021

@Tyriar The last commit deprecates writeSync as it wont work with async handlers anymore.

Reminder: this should be noted later on in release notes in case someone relies on it.

@jerch
Copy link
Copy Markdown
Member Author

jerch commented Jan 23, 2021

Done with the basic parser preparations. After some stack handling cleanup the async test case (ls -lR /usr/lib with SGR marked as async) takes only slightly longer (~1.4 times or 29 vs. 21 MB/s), guess this wont get any better.

Next steps:

  • Implement parser tests for CSI and ESC mixing all sort of sync and async handlers.
  • Move on to OSC and DCS async support. As written earlier, this is much more involved, as these handlers are fed by 3 methods from the parser via separate subparsers (start | put | end). Imho start can always be treated as sync, much in the sense of a constructor. put is more tricky - from my initial perspective to get proper flow control from async handlers in general, async support in put would be crucial here as huge amount of data may arrive. Not sure yet to go that path, a simpler impl would just treat the whole sequence as being blocked on the end call (which again needs a fixed mem limit during put to avoid OOM and some bailout logic for async workers).

@jerch jerch marked this pull request as ready for review February 25, 2021 18:18
@jerch
Copy link
Copy Markdown
Member Author

jerch commented Feb 25, 2021

@Tyriar @mofux
The PR is ready for a first review.

Copy link
Copy Markdown
Member

@Tyriar Tyriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, no concerns about stability/correctness considering the amount of testing coverage/changes.

* be used anymore. If you need blocking semantics on data input consider
* `write` with a callback instead.
*
* @deprecated Unreliable, will be removed soon.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this will take a little while to adopt in vscode, it's used currently in the local echo and reconnect features. It's also used in a bunch of tests which I think should be harmless to stay.

Copy link
Copy Markdown
Member Author

@jerch jerch Feb 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we also had many tests with writeSync, but those were easy replaced by a writeP implementation:

public writeP(data: string | Uint8Array): Promise<void> {
return new Promise(r => this.write(data, r));
}

I used the same methodP idea for testing the methods along the async callstack:

public async parseP(data: string | Uint8Array): Promise<void> {
let result: Promise<boolean> | void;
let prev: boolean | undefined;
while (result = this.parse(data, prev)) {
prev = await result;
}
}

For the deeper calls it is a bit more cumbersome to get done right, thus my warnings all over the place not to call those methods directly anymore. Note that it always was dangerous to directly inject chunks at those lower levels while Terminal.write still holds data (will mess up parser states even in sync mode), but now with async it is even more dangerous as it might break the proper continuation. Lets hope that ppl dont dismantle the callstack and call the lower methods directly. Also I dont know of any TS way to prevent that, is there an access pattern like friend in C++?

Final note on writeSync - I did not remove it yet, because it will keep working as before as long as there are no async handlers hooked into the parser. It even keeps working with async handlers if pending data does not trigger any async action (thats what I meant with promise islands - you can still sail in sync water normally at high speed, but now there are promise shallow regions where the boat can crash if not used with writeP). Initially I thought about marking async handlers statically, so the parser would have a "map" of promise regions, but that turned out as not being helpful, as it cannot be decided prehand without actually parsing the data. Thus I went with the simpler handling of "whenever a promise is returned, it is an async handler" (effectively sailing on "runtime sight").

* Note: Never call this directly for a running terminal instance in production.
* Always use `Terminal.write`, which provides in-band blocking and correct exection order.
*/
public parse(data: string | Uint8Array, promiseResult?: boolean): void | Promise<boolean> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work, and if so does it match the valid return types?

public parse(data: string | Uint8Array): void | Promise<boolean>;
public parse(data: string | Uint8Array, promiseResult: true): Promise<boolean>;
private parse(data: string | Uint8Array, promiseResult?: boolean): void | Promise<boolean> {
}

Copy link
Copy Markdown
Member Author

@jerch jerch Feb 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I dont see much gain from variant 1. and 2. beside easier dealing with a terminal breakdown during tests. Main reason - the proper continuation is placed at WriteBuffer level in then, which InputHandler.parse is not aware of, thus the right execution order cannot be continued from InputHandler.parse directly anymore. I'd favour a more restrictive access pattern here (like friend in C++), but dont know how to do something like that in TS.

In theory during a normal terminal run with valid data in the write buffer the callstack of Terminal.write --> InputHandler.parse --> Parser.parse --> (Osc|DcsParser.end) --> async handler must not be "sidecalled", whenever an async handler got paused, as it has to be resumed from the very top. I dont like the manual stack unwinding done here, but it is the only way I found to keep sync code fast.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I didn't explain enough. I think the above would give a compile error if parse(data, false) was called, and it would return Promise<boolean> when parse(data, true) is called (instead of void | Promise<boolean>). Just a little thing that could potentially catch bugs by indicating what is valid.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I fully understand your idea - do you want to conclude from provided arguments the return type, like narrowing it down? That is not possible, as the fact, whether a promise gets returned, depends on these two rules:

  • an async handler is registered for a sequence XY (and will not be skipped by an earlier handler)
  • AND data contains sequence XY

It never depends on the arguments of parse, all these variants can return void | Promise<boolean>:

parse(data: string | Uint8Array): void | Promise<boolean>;
parse(data: string | Uint8Array, promiseResult: true): Promise<boolean>;
parse(data: string | Uint8Array, promiseResult: false): Promise<boolean>;

// which reduces to
parse(data: string | Uint8Array, promiseResult?: boolean): void | Promise<boolean>;

In fact the promise return value and the result argument are responsible for the two-sided handling of the stalled callstack as pausing/resuming:

  • promise return value
    • ascending path of stalled callstack (on pause)
    • stack progress is saved at each call level individually
    • fast unwinding by direct return jumps
    • promise bubbles up to top level to enable micro- or macrotask exection
  • micro- or macrotask exection, eventually our promise gets fullfilled and we grab its return value
  • promiseReturn:
    • descending path of stalled callstack (on resume)
    • rewind stack progress from stack saves on each level
    • bubbles down to occurrence of promise creation, continues innermost resume with return value (indicating whether next handler should be probed as well)
    • back to sync processing

The only assumption that could be made is for two consecutive parse calls:

  • call returns a promise - then next call has to be called with the awaited result of that promise in promiseResult for proper continuation and execution order (we are between pause/resume)
  • parse calls finishs without returning a promise - we are in sync mode, nothing to wait for before we can process the next chunk

Technically the whole handling is a quite explicit way of coroutines as one could write them in C (where it can be expressed even more elegant with static vars 🙈). The different call requirements for sync vs. async is an unfortunate side effect of this explicit coroutine implementation. But as noted earlier, it was the fastest I was able to find (under the assumption of only few to none async handlers ever being registered). True that the full promisified call chain has much simpler call signatures, but it also runs 4 times slower.

... Just a little thing that could potentially catch bugs by indicating what is valid.

Since we can only derive the proper call signature at runtime by knowing the result of the last call, imho TS's static type system cannot help here much.

Sorry for the lengthy explanation, there is still a high chance I misses your idea here.

Comment thread src/common/InputHandler.ts Outdated
Comment thread src/common/input/WriteBuffer.ts Outdated
Comment thread src/common/input/WriteBuffer.ts Outdated
Comment thread src/common/input/WriteBuffer.ts Outdated
Comment thread src/common/parser/DcsParser.ts Outdated
Comment thread src/common/parser/DcsParser.ts Outdated
Comment thread src/common/parser/EscapeSequenceParser.ts Outdated
Comment thread typings/xterm.d.ts
- reorder exception throwing to be sync to band position
- document WriteBuffer._innerWrite
- remove any declarations
- remove conditional assigments
- use faster return value branching in all parsers
- remove dead code in benchmark
@jerch
Copy link
Copy Markdown
Member Author

jerch commented Feb 27, 2021

@Tyriar Thx for the review, guess I addressed most points. Also if you have an idea how to apply a more restrictive access pattern like friend in C++ I am all ears.

@jerch
Copy link
Copy Markdown
Member Author

jerch commented Feb 28, 2021

@Tyriar I changed the continuation of faulty handlers to false to give the default handler a chance to still get called afterwards. I think thats the better default behavior, as most ppl will hook on already handled sequences, thus the sequence should keep bubbling to our default handler, even if the custom async handler error'ed out.

Some open questions:

Do we need a second timeout with a hard rejection?
Currently we have a timeout warning after 5s, that fires only once. Imho this is enough for a careful dev to spot limitations of his handler implementation and to ensure himself, that the promise always gets resolved/rejected in a timely fashion. Not so for a sloppy dev - the code might have dangling promise endpoints due to improper branch testing and such. Here the terminal would block forever, if such an endpoint got triggered.
To avoid that hard blocking scenario in production environments later on we could place a second timeout on a pending promise and let it fail hard/reject after that second timeout. This would keep the terminal interactive/responsive, although the terminal state might be messed up, if the custom handler already altered something prior promise creation (or in the executor itself). But it kinda raises "realtime" questions about guaranteed execution times.

Better async stack introspection needed?
It is somewhat tricky to get better introspection information on the blocking handler, mainly because of the scattered stack saves and the way we register custom handlers through anonymous in-between functions. The scattered stack is not a big deal, it could be collected by a sync descent into the callstack, still the information is somewhat useless without proper marking of the handler function itself. If we want better introspection here, we would have to store more information along the registering path.

I dont see both aspects as showstoppers of the actual PR and suggest to implement them in later PRs if we encounter issues around blocking async handlers and need more failstate handling to overcome them.

@Tyriar
Copy link
Copy Markdown
Member

Tyriar commented Mar 15, 2021

Also if you have an idea how to apply a more restrictive access pattern like friend in C++ I am all ears.

I normally stick "friend classes" in the same file and don't export them. Closest I've come up with for TS.

@Tyriar
Copy link
Copy Markdown
Member

Tyriar commented Mar 15, 2021

Do we need a second timeout with a hard rejection?

The warning is good, recovering is probably a good idea too so the terminal doesn't just break because of some addon.

Better async stack introspection needed?

👍 to defer, not sure we need this but I haven't tried to implement an async handler.

@Tyriar Tyriar added this to the 4.12.0 milestone Mar 15, 2021
Copy link
Copy Markdown
Member

@Tyriar Tyriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm 🚀

Comment thread src/common/InputHandler.ts Outdated
Comment thread src/common/InputHandler.ts Outdated
* Note: Never call this directly for a running terminal instance in production.
* Always use `Terminal.write`, which provides in-band blocking and correct exection order.
*/
public parse(data: string | Uint8Array, promiseResult?: boolean): void | Promise<boolean> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I didn't explain enough. I think the above would give a compile error if parse(data, false) was called, and it would return Promise<boolean> when parse(data, true) is called (instead of void | Promise<boolean>). Just a little thing that could potentially catch bugs by indicating what is valid.

Comment thread src/common/parser/DcsParser.ts Outdated
Comment thread src/common/parser/OscParser.ts Outdated
@jerch
Copy link
Copy Markdown
Member Author

jerch commented Mar 17, 2021

@Tyriar I pressed the merge button 😅. Since you mentioned above the explicit usage of writeSync in vscode - plz drop me a note if you encounter issues with a promise based replacement there or need something faster than simple "everything is handled by promise turnover".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

method to propagate backpressure from parser handler

2 participants