Skip to content

Remove Arrow support from obspec.List #13

Closed
@kylebarron

Description

@kylebarron

Obstore supports returning Arrow RecordBatch from each chunk in obstore.list and returning an Arrow Table from obstore.list_with_delimiter.

I would like this to be an obstore implementation detail instead of a requirement of all obspec implementations.

I had hoped that I would be able to remove the return_arrow keyword from obspec's list methods but still allow obstore's implementation to add the return_arrow keyword as long as it defaults to False and returns a list[ObjectMeta] by default. However it looks like this doesn't pass pylance:

Image

See what I tried in #14

That said, since obspec's list is defined in terms of the Arrow PyCapsule Interface, setting return_arrow=True allows for very generic programming. The return type could be a pandas, Polars, DuckDB, pyarrow, nanoarrow, or arro3 or anything else that supports the protocol. (There is a wrinkle that list requires something that implements the ArrowArray interface, which I'm not sure pandas or polars define, since they only have a concept of a multiple-chunked data structure)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions