-
Notifications
You must be signed in to change notification settings - Fork 122
Add async I/O methods [WIP] #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Changed the base branch to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tl;dr: +0
I don't have any strong objections to the way things are currently laid out. It feels a little awkward to have all of the RuntimeError
checks inside the save
method, but if that's how it has to be, so it goes.
I'll mention a couple of possible other implementations that popped to mind; my understanding is pretty low for the Python async ecosystem so these are probably Very Wrong™:
- Go "async-first" and make
save
(and friends) async, and use "blocking" to describe non-async methods (e.g.read_blocking
). This is probably a bad idea because PySTAC is so foundational and we don't want people to have to go async to just use it. - Use two different classes for Async and Sync objects, e.g.
AsyncItem
. I think this probably breaks a lot of the type inference stuff used to detect item types? - Is there any way to "store" asyncio information on
StacIO
to avoid all thosetry/excepts
? Probably not...just something that I thought of while reading tokio's tutorial on bridging w/ sync code: https://tokio.rs/tokio/topics/bridging.
I've never used asyncio, so I don't have a lot to offer in review. |
Thanks @gadomski!
I agree. I was going to abstract this into a separate
I thought about this, too, and was wary for the same reasons. I suspect that the majority of our users are not as familiar with async in Python and might be at least intimidated by seeing the methods they are used to using switched to async.
I had toyed with this idea a bit, too. I think the general recommendation for async programming is to put sync and async I/O operations in different classes, but as you mentioned that might cause problems with some of our STAC object type sniffing. It also would make it a little trickier to take advantage of async operations within synchronous methods (like here) for parallelizing operations. That said, I'm open to other ways of structuring this. One thought would be to have an
I had originally structured the |
This is an interesting idea. It looks like their approach is to wrap the async functions to create blocking versions, which is different from the httpx approach of having completely separate async and blocking code. It might be worth pursuing this approach to see if it simplifies the code at all. |
When I was updating Planet's stactools plugin I noticed that stactools already depends on aiohttp. Maybe it would be worth considering using that instead of httpx? So that a user's stactools install doesn't contain more web clients than it needs. |
Related Issue(s)
Description
Implements asynchronous I/O operations by adding
async
equivalents to the followingStacIO
methods:read_text
/read_text_async
(abstract methods)write_text
/write_text_async
(abstract methods)read_json
/read_json_async
save_json
/save_json_async
read_stac_object
/read_stac_object_async
This PR also creates a new
DefaultStacIOAsync
class with default implementations of the async methods usingaiofiles
(for local file I/O) andhttpx
(for reading via HTTP). In order to take advantage of these async methods, users must install PySTAC using theaiofiles
andhttpx
extras or create their own implementations of the async abstract methods using their libraries of choice (e.g.aiohttp
).I added a
Catalog.save_async
method that takes advantage of these new async I/O methods as an example of how me might implement async I/O throughout the code base. Feedback would be appreciated on the approach of having sync methods call their equivalent async method, and on the naming conventions used.There is still quite a bit of work to do on this before we can consider the linked issues to be closed, but I would like feedback on the changes to the
Catalog
andStacIO
APIs before I do too much more work on the rest of the code. I've added some additional TODOs in the PR checklist.This PR represents a breaking change, as it requires developers to implement the abstract
read_text_async
andwrite_text_async
methods in any customStacIO
implementations. However, users who are using the default implementation should not need to make any changes.PR Checklist
pre-commit run --all-files
)scripts/test
)StacIO
methods calling async methods vs. using separately maintained synchronous code.Catalog.walk
)