-
Notifications
You must be signed in to change notification settings - Fork 208
Description
Expected behaviour and actual behaviour.
I want to be able to read a geojson file, containing fields with complex data types eg (lists and dicts) with the flexibility that the data might not be a consistent type. Eg, one feature has a string in field A
and the next feature has a list in field A
.
With default settings, this works poorly, depending on the order the types are seen and what they contain, the reported field type and the actual formatting of the values varies widely.
This improves if I use ARRAY_AS_STRING="YES"
all my feature values are returned as expected with a predictable field type.
The only issue is when I have a mixture of string values and complex values in the same column.
Say I have feature 1: {"test_field_1": "abc"}
and feature 2: {"test_field_1": ["x", "y"]}
If the feature 1 is first in the geojson features and feature 2 is second, the field is reported as a string field and both values are cast to a string.
Ok for the first feature 'abc', <class 'str'>
, less ideal for the second '[ "x", "y" ]', <class 'str'>
If the features appear in the geojson in the opposite order, a JSONDecodeError
is thrown, when it tries to json.loads
the string value in the second feature. Expecting value: line 1 column 1 (char 0)
Reproducing the problem.
See the attached pytest:
Click to show the test or see attached file
import json
from copy import deepcopy
import fiona
import pytest
@pytest.mark.parametrize("include_json_feature", (0, 1, None))
@pytest.mark.parametrize(
("raw_data", "python_type", "fiona_type"),
(
([1.0, 2.0, 3.0], list, "json"),
(["a", "b", "c"], list, "json"),
([None, None, None], list, "json"),
({"key1": 123, "key2": "abc"}, dict, "json"),
("abc", str, "str"),
(72, int, "int32"),
(135.89, float, "float"),
(True, bool, "bool"),
),
)
def test_read_layer_with_complex_type(tmp_path, raw_data, python_type, fiona_type, include_json_feature):
expected = deepcopy(raw_data)
filename = tmp_path.joinpath("complex_layer.geojson")
complex_layer_json = {
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {"test_field_1": raw_data, "test_field_2": "TBD"},
"geometry": {"type": "Point", "coordinates": [144.9621854144882, -37.81416134693996]},
}
],
}
if include_json_feature is not None:
# insert a feature we know is complex before or after our feature
complex_layer_json["features"].insert(
include_json_feature,
{
"type": "Feature",
"properties": {"test_field_1": ["x", "y"], "test_field_2": "TBD"},
"geometry": {"type": "Point", "coordinates": [144.8806593050943, -38.072339738912845]},
},
)
print("geojson:\n", json.dumps(complex_layer_json)) # used to validate the file is valid
with open(filename, "w") as outfile:
json.dump(complex_layer_json, outfile, indent=4)
with fiona.Env(), fiona.open(filename, "r", ARRAY_AS_STRING="YES") as source:
features = [
(value, type(value), source.schema["properties"].get(field))
for f in source
for field, value in f["properties"].items()
if field == "test_field_1"
]
expected_features = [(expected, python_type, "json" if include_json_feature is not None else fiona_type)]
if include_json_feature is not None:
expected_features.insert(include_json_feature, (["x", "y"], list, "json"))
assert features == expected_features
Attempted solution.
I was trying to figure out how I could pass in a custom JSONDecoder
or an object_hook
but I was unable to find a parameter that allowed me to pass either far enough in to the location where json.loads
is being called.
Versions.
OS: Linux ----- 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
(Ubuntu 22.04.3 LTS)
Fiona: fiona==1.10.1
GDAL: 3.9.2