Skip to content

Working with partitions #30

@remche

Description

@remche

We are working with a multiple files catalog, eg:

plugins:
  source:
    - module: intake_parquet
sources:
  test:
    description: Short example parquet data
    driver: parquet
    args:
      urlpath: 
        - s3://bucket/path/file.parquet
        - s3://bucket/path/file2.parquet
        - s3://bucket/path/file3.parquet
      storage_options:
        anon: True
        client_kwargs:
          endpoint_url: https://example.com
  1. With only two entries, discover() is fine, we can read_partition(0) and read_partition(1), but a full read() fails with ValueError: storage_options passed with buffer, or non-supported URL, probably because ParquetSource.read()does not handle array in url_path.
  2. With more that 2 entries, discover() fails with a `KeyError.

Thanks for maintaining this intake plugin !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions