Skip to content

bug: Nullable struct with non-nullable children causes ArrowInvalid: cast #11939

@ssethuraman-ft

Description

@ssethuraman-ft

What happened?

If the source table schema has a nullable struct column whose child fields are non-nullable, executing to_pyarrow() fails with

ArrowInvalid: field 'X' of type T has nulls. Can't cast to non-nullable field 'X' of type T

I think the failure occurs for rows where the parent struct is NULL and the downstream conversion materializes child fields as NULLs, conflicting with the non-nullable child schema.

Code to reproduce the error

import ibis
from ibis.expr import datatypes as dt

# Define the struct type:
# - The struct itself is nullable 
# - Its children are required (non-nullable) string and int64
address_type = dt.Struct({
    "street": "!string",  # required child
    "number": "!int",     # required child
})

# Build the table schema with the nullable struct column
schema = ibis.schema({
    "id": "int",
    "address": address_type,
})

ibis.set_backend("polars")

schema = ibis.schema(schema)

data = [
    {"id": 1, "address": {"street": "Main St", "number": 10}},
    {"id": 2, "address": None},
    {"id": 3, "address": {"street": "High St", "number": 42}},
]

# Create a memtable with the schema
table = ibis.memtable(data, schema=schema)

# Show the inferred schema (for verification)
print(table.schema())

# This fails with ArrowInvalid: field has nulls. Can't cast to non-nullable string type
print(table.select("address").to_pyarrow())

What version of ibis are you using?

10.3.0

What backend(s) are you using, if any?

BigQuery

Relevant log output

ibis.Schema {
  id       int64
  address  struct<street: !string, number: !int64>
}

Traceback (most recent call last):
  File "/opt/projects/scratch/ibisbug.py", line 39, in <module>
    print(table.select("address").to_pyarrow())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/projects/scratch/venv/lib/python3.11/site-packages/ibis/expr/types/core.py", line 579, in to_pyarrow
    return self._find_backend(use_default=True).to_pyarrow(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/projects/scratch/venv/lib/python3.11/site-packages/ibis/backends/polars/__init__.py", line 547, in to_pyarrow
    result = self._to_pyarrow_table(expr, params=params, limit=limit, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/projects/scratch/venv/lib/python3.11/site-packages/ibis/backends/polars/__init__.py", line 536, in _to_pyarrow_table
    return PyArrowData.convert_table(df.to_arrow(), expr.as_table().schema())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/projects/scratch/venv/lib/python3.11/site-packages/ibis/formats/pyarrow.py", line 332, in convert_table
    arrays = [
             ^
  File "/opt/projects/scratch/venv/lib/python3.11/site-packages/ibis/formats/pyarrow.py", line 333, in <listcomp>
    cls.convert_column(table[name], dtype) for name, dtype in schema.items()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/projects/scratch/venv/lib/python3.11/site-packages/ibis/formats/pyarrow.py", line 323, in convert_column
    return column.cast(desired_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 597, in pyarrow.lib.ChunkedArray.cast
  File "/opt/projects/scratch/venv/lib/python3.11/site-packages/pyarrow/compute.py", line 412, in cast
    return call_function("cast", [arr], options, memory_pool)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/_compute.pyx", line 604, in pyarrow._compute.call_function
  File "pyarrow/_compute.pyx", line 399, in pyarrow._compute.Function.call
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: field 'street' of type large_string has nulls. Can't cast to non-nullable field 'street' of type string

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions