Skip to content

Trouble loading complex geojson values #1493

@stretch4x4

Description

@stretch4x4

Expected behaviour and actual behaviour.

I want to be able to read a geojson file, containing fields with complex data types eg (lists and dicts) with the flexibility that the data might not be a consistent type. Eg, one feature has a string in field A and the next feature has a list in field A.

With default settings, this works poorly, depending on the order the types are seen and what they contain, the reported field type and the actual formatting of the values varies widely.

This improves if I use ARRAY_AS_STRING="YES" all my feature values are returned as expected with a predictable field type.
The only issue is when I have a mixture of string values and complex values in the same column.

Say I have feature 1: {"test_field_1": "abc"} and feature 2: {"test_field_1": ["x", "y"]}
If the feature 1 is first in the geojson features and feature 2 is second, the field is reported as a string field and both values are cast to a string.
Ok for the first feature 'abc', <class 'str'>, less ideal for the second '[ "x", "y" ]', <class 'str'>

If the features appear in the geojson in the opposite order, a JSONDecodeError is thrown, when it tries to json.loads the string value in the second feature. Expecting value: line 1 column 1 (char 0)

Reproducing the problem.

See the attached pytest:

Click to show the test or see attached file
   import json
   from copy import deepcopy
   
   import fiona
   import pytest
   
   
   @pytest.mark.parametrize("include_json_feature", (0, 1, None))
   @pytest.mark.parametrize(
       ("raw_data", "python_type", "fiona_type"),
       (
           ([1.0, 2.0, 3.0], list, "json"),
           (["a", "b", "c"], list, "json"),
           ([None, None, None], list, "json"),
           ({"key1": 123, "key2": "abc"}, dict, "json"),
           ("abc", str, "str"),
           (72, int, "int32"),
           (135.89, float, "float"),
           (True, bool, "bool"),
       ),
   )
   def test_read_layer_with_complex_type(tmp_path, raw_data, python_type, fiona_type, include_json_feature):
       expected = deepcopy(raw_data)
       filename = tmp_path.joinpath("complex_layer.geojson")
       complex_layer_json = {
           "type": "FeatureCollection",
           "features": [
               {
                   "type": "Feature",
                   "properties": {"test_field_1": raw_data, "test_field_2": "TBD"},
                   "geometry": {"type": "Point", "coordinates": [144.9621854144882, -37.81416134693996]},
               }
           ],
       }
   
       if include_json_feature is not None:
           # insert a feature we know is complex before or after our feature
           complex_layer_json["features"].insert(
               include_json_feature,
               {
                   "type": "Feature",
                   "properties": {"test_field_1": ["x", "y"], "test_field_2": "TBD"},
                   "geometry": {"type": "Point", "coordinates": [144.8806593050943, -38.072339738912845]},
               },
           )
   
       print("geojson:\n", json.dumps(complex_layer_json))  # used to validate the file is valid
   
       with open(filename, "w") as outfile:
           json.dump(complex_layer_json, outfile, indent=4)
   
       with fiona.Env(), fiona.open(filename, "r", ARRAY_AS_STRING="YES") as source:
           features = [
               (value, type(value), source.schema["properties"].get(field))
               for f in source
               for field, value in f["properties"].items()
               if field == "test_field_1"
           ]
   
       expected_features = [(expected, python_type, "json" if include_json_feature is not None else fiona_type)]
   
       if include_json_feature is not None:
           expected_features.insert(include_json_feature, (["x", "y"], list, "json"))
   
       assert features == expected_features
   

test_fiona.py.txt

Attempted solution.

I was trying to figure out how I could pass in a custom JSONDecoder or an object_hook but I was unable to find a parameter that allowed me to pass either far enough in to the location where json.loads is being called.

Versions.

OS: Linux ----- 5.15.167.4-microsoft-standard-WSL2 #⁠1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
(Ubuntu 22.04.3 LTS)
Fiona: fiona==1.10.1
GDAL: 3.9.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions