Skip to content

Feature requests #195

@martindurant

Description

@martindurant

It would be great:

  • you could select columns for reading from parquet, or, even better, select from the schema hierarchy in general for deeper structured datasets
  • you allow reading row-group X from a parquet dataset; this would allow for distributing the work to threads or even a cluster. Of course, the reader would need to reveal how many row-groups it contains
  • some to_buffers kind of method exists to expose the internal buffers of an arrow structure, in the order defined in the arrow docs; also the corresponding from_buffers

Doing all of this would essentially answer what is envisaged in dask/fastparquet#931 : getting what we really need out of arrow without the cruft. It would interoperate nicely with awkward, for example.

Other nice to haves (and I realise you wish to keep the scope as small as possible)

  • parquet filter
  • str and dt compute functions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions