asyncio: Improve support Python 3.8 on Windows #2815

bdarnell · 2020-02-22T23:39:15Z

This commit removes the need for applications to work around the
backwards-incompatible change to the default event loop. Instead,
Tornado will detect the use of the windows proactor event loop and
start a selector event loop in a separate thread.

Closes #2804

minrk · 2020-02-27T10:47:08Z

As a Jupyter maintainer, I'd love to see this in a release, so anything I can do to help it along, let me know.

I opened bdarnell#2 against this branch, which gets things passing for me on non-Windows, but it looks like there are still a couple thread-related kinks when this actually gets used.

We've been pretty slow to roll out the application-level workaround for #2804 because we have so many applications. I think your points about this being a pain at the application level and not having a good library-level solution are being borne out. Notebook itself and ipykernel were fixed relatively promptly, but there are still stragglers like QtConsole, and there isn't a good way for us in Jupyter to have a single global fix that wouldn't have the same problems as tornado itself setting the event loop at import/start/etc.

minrk · 2020-02-27T11:10:00Z

The set_wakeup_fd error (ValueError: set_wakeup_fd only works in main thread) is a bug in Python 3.8.0, fixed in 3.8.1 https://bugs.python.org/issue34679 (commit), but appveyor still only has 3.8.0 on the Windows images.

minrk · 2020-02-27T13:30:53Z

Another test failure is in ioloop test test_handler_callback_file_object.

The test failure is because the fds list is expected to be [socket_obj, socket_fileno], but some runs (not every time), events occur more than once, e.g. [socket_obj, socket_obj, socket_fileno, socket_fileno, socket_fileno]. I believe this is caused by races in the thread synchronization:

When a read event fires:

[selector] calls _real_loop.call_soon_threadsafe to schedule read handler
[selector] still has read events waiting, next iteration schedules read handler again
[main] calls wrapped callback first time, closes sock
[main] calls wrapped callback second time

The _run_on_selector uses concurrent.futures to make a blocking call from real loop -> selector. I think to preserve these handlers rigorously, we need the callback to be similarly symmetrical with _run_on_main called frmo the selector, instead of just the async call_soon_threadsafe so that the selector is blocked until the actual callback finishes firing in the main loop.

bdarnell · 2020-02-28T02:21:16Z

Excellent, thanks for jumping in and testing!

That test_handler_callback_file_object is a tricky one - wonder why it didn't show up on my machine? Anyway, a blocking _run_on_main won't work here - it would deadlock if the callback needed to call back to the selector thread to add or remove a handler. A short delay (like you have in your patch) would get this test passing (at least most of the time) but doesn't reliably solve the problem.

Other partial solutions include

Declaring spurious wakeups to be legal. Semantically I think this would be fine for most production code, although tests like this one are sensitive to it. Plus unless there is some sort of delay on the selector thread it could lead to unbounded memory growth by adding more callbacks to the main thread's queue.
Call select.select before running a callback to ensure the desired condition still holds. Same problem of unbounded memory growth, but it lets us reject the spurious callbacks a little faster and without visible changes to application code behavior
Unregister file descriptors from the selector before running their callbacks, and re-register them once they're finished. This is bad for performance but I think it would work OK.

I think we need to control the select loop itself instead of just using the SelectorEventLoop black box API. This probably means writing directly to the lower-level selector interface (or select.select? Since this is windows-only it wouldn't benefit from the stateful selector objects and I think this interface may turn out a little simpler), which I just said I didn't want to do in https://bugs.python.org/issue37373#msg362658. Ah well, at least socket.socketpair looks like it'll make the implementation simpler this time around.

minrk · 2020-02-28T11:01:13Z

That test_handler_callback_file_object is a tricky one - wonder why it didn't show up on my machine?

I think it makes sense since it’s a race - sometimes main will process the event before selector’s next tick, in which case it’ll pass, sometimes it won’t. It fails less than half of the time for me, but pretty often still. I saw everywhere from 0 to 7 extra events when I added some prints to debug it.

The delay I added wasn’t a generic sleep to help win the race more often; a future is used to prevent duplicate calls to the “real” callback, while duplicate firing in the selector results in a sleep until the “real” callback finishes firing.

I’m only working part time, so it’s a bit of a flyby contribution, but if you haven’t had a chance by the time I get some time next week, I’ll look at the select approach.

bdarnell · 2020-07-25T18:03:08Z

I've reworked this to fix the race; all the tests are now passing on windows. It's still failing in CI, but only because there's a bunch of log spam. Most of that is coming from asyncio itself so I may just have to disable the log checks on windows.

@minrk If you'd like to take another look, pay attention to the issues behind #2719. That test needed some tweaking and I'm not entirely sure whether the test was at fault or if the bug is resurfacing with the new event loop.

Without this try/finally, if this test ever fails, errors can be reported in a confusing way.

Closing the file descriptor without removing the corresponding handler is technically incorrect, although the default IOLoops don't have a problem with it.

This commit removes the need for applications to work around the backwards-incompatible change to the default event loop. Instead, Tornado will detect the use of the windows proactor event loop and start a selector event loop in a separate thread. Closes tornadoweb#2804

Running a whole event loop on the other thread leads to tricky synchronization problems. Instead, keep as much as possible on the main thread, and call out to a second thread only for the blocking select system call itself.

Use this on windows due to a log spam issue in asyncio.

Restarting the event loop to "cleanly" shut down a coroutine introduces other problems (mainly manifesting as errors logged while running tornado.test.gen_test). Replace the coroutine with a pair of callbacks so we don't need to do anything special to shut down without logging warnings.

The just-released version 0.3.0 is incompatible with our older pinned version of sphinx.

bdarnell · 2020-09-02T16:07:26Z

This is getting more or less ready. I've submitted a fix to cpython to eliminate the "self pipe" logging (we'll still need to run without the log assertion until that fix comes out in python 3.8.6). Testing locally with that build I found another unrelated source of log spam which I fixed by moving away from coroutines in the event loop itself (it's hard to shut these things down cleanly when they're a part of the event loop).

Now things are looking good except for one failure I just saw on appveyor that I haven't been able to reproduce either locally or retrying on appveyor. It's not completely clear what's going on because appveyor is running python with the log spam issue (and it's just running 3.8.0 instead of 3.8.5, although I don't see anything in the changelog that's obviously relevant).

stonebig · 2020-09-20T15:17:55Z

Any hope for jupyterlab-3 final ? Around mid october

bdarnell · 2020-09-20T18:38:13Z

Yeah, that should be doable.

I've re-run CI a bunch of times without seeing that mysterious failure again, so I'll just chalk it up to a fluke and merge this. Then I think there are just a couple of open PRs I want to evaluate and maybe merge before starting the release process for 6.1.

ploxiln mentioned this pull request Apr 9, 2020

NotImplementedError when calling HTTPServer.listen in Python 3.8.1 and Tornado 6.0.x #2804

Closed

bdarnell added asyncio windows labels May 3, 2020

bdarnell force-pushed the thread-selector branch 3 times, most recently from 71ba9e3 to 0304c8c Compare July 7, 2020 02:00

bdarnell force-pushed the thread-selector branch 6 times, most recently from c0a4fc9 to c789c43 Compare July 25, 2020 17:49

stonebig mentioned this pull request Jul 26, 2020

release 2020-03 follow-up winpython/winpython#852

Closed

16 tasks

stonebig mentioned this pull request Aug 7, 2020

Support Tornado>=6 spotify/luigi#2949

Closed

bdarnell force-pushed the thread-selector branch 2 times, most recently from 0a0b7e3 to eebd582 Compare August 7, 2020 17:18

bdarnell force-pushed the thread-selector branch from fc05303 to 244888e Compare September 2, 2020 01:34

bdarnell and others added 7 commits September 2, 2020 11:10

httpclient_test: Improve error reporting

dcb23ee

Without this try/finally, if this test ever fails, errors can be reported in a confusing way.

iostream_test: Improve cleanup

2c357af

Closing the file descriptor without removing the corresponding handler is technically incorrect, although the default IOLoops don't have a problem with it.

test: Add missing level to ExpectLog call

7d7b4dd

asyncio: Rework AddThreadSelectorEventLoop

5cfe2fc

Running a whole event loop on the other thread leads to tricky synchronization problems. Instead, keep as much as possible on the main thread, and call out to a second thread only for the blocking select system call itself.

test: Add an option to disable assertion that logs are empty

52c45c0

Use this on windows due to a log spam issue in asyncio.

docs: Pin version of sphinxcontrib-asyncio

9602c6a

The just-released version 0.3.0 is incompatible with our older pinned version of sphinx.

bdarnell force-pushed the thread-selector branch from 244888e to 9602c6a Compare September 2, 2020 15:10

bdarnell merged commit 94f155d into tornadoweb:master Sep 20, 2020

omasoud mentioned this pull request Nov 6, 2020

"not implemented" message with jupyter notebook under python-3.8.0a4 (due to switch to "proactor" loop on windows) jupyter/notebook#4613

Closed

minrk mentioned this pull request Apr 23, 2021

Using zmq.asyncio, getting NotImplementedError when creating a REQ socket zeromq/pyzmq#1521

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

asyncio: Improve support Python 3.8 on Windows #2815

asyncio: Improve support Python 3.8 on Windows #2815

Uh oh!

bdarnell commented Feb 22, 2020

Uh oh!

minrk commented Feb 27, 2020

Uh oh!

minrk commented Feb 27, 2020 •

edited

Loading

Uh oh!

minrk commented Feb 27, 2020

Uh oh!

bdarnell commented Feb 28, 2020

Uh oh!

minrk commented Feb 28, 2020

Uh oh!

bdarnell commented Jul 25, 2020

Uh oh!

bdarnell commented Sep 2, 2020

Uh oh!

stonebig commented Sep 20, 2020

Uh oh!

bdarnell commented Sep 20, 2020

Uh oh!

Uh oh!

asyncio: Improve support Python 3.8 on Windows #2815

asyncio: Improve support Python 3.8 on Windows #2815

Uh oh!

Conversation

bdarnell commented Feb 22, 2020

Uh oh!

minrk commented Feb 27, 2020

Uh oh!

minrk commented Feb 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minrk commented Feb 27, 2020

Uh oh!

bdarnell commented Feb 28, 2020

Uh oh!

minrk commented Feb 28, 2020

Uh oh!

bdarnell commented Jul 25, 2020

Uh oh!

bdarnell commented Sep 2, 2020

Uh oh!

stonebig commented Sep 20, 2020

Uh oh!

bdarnell commented Sep 20, 2020

Uh oh!

Uh oh!

minrk commented Feb 27, 2020 •

edited

Loading