Skip to content

to_geodataframe error: All arrays must be of the same length #76

@floriandeboissieu

Description

@floriandeboissieu

It may occur that the items of an ItemCollection do not share all the property keys. An example:

import pystac_client
import stac_geoparquet

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
)

search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=[6.5425, 47.9044, 6.5548, 47.9091],
    datetime="2024-07-20/2024-08-11",
    query={"eo:cloud_cover": {"lt": 30.}},
    sortby="datetime",
)

coll = search.item_collection()

print(set(coll[0].properties.keys()).symmetric_difference(coll[1].properties.keys()))
# {'s2:dark_features_percentage'} # this property is missing in coll[1:3], due to a different processing baseline (05.11 instead of 05.10)

records = coll.to_dict()["features"]
stac_geoparquet.to_geodataframe(records)
# *** ValueError: All arrays must be of the same length

In stac-geoparquet <= 3.2, the geodataframe was built from a list of dict, which was introducing NaN where a property was missing. Since commit #fb798f4 (included in version 4.0+), the geodataframe is built from a dict of lists (for acceleration I suppose), thus a missing property in an item makes the operation fail with error at L177: All arrays must be of the same length

As an ItemCollection cannot garanty that all properties are shared by all items (or am I wrong about that?):

  • wouldn't it be wise to remove properties that are not shared (e.g. properties with length smaller than others) or fill the missing values?
  • or is it wanted that this issue is delegated to the user?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions