Skip to content

fixed length subtype recarray + "auto" shards crashes #3546

@ilan-gold

Description

@ilan-gold

Zarr version

3.1.4.dev29+gfc8e8ad1a

Numcodecs version

0.16.3

Python Version

3.12.3

Operating System

macOS-15.1-arm64-arm-64bit

Installation

uv pip

Description

The reproducer below fails with shards="auto" but works otherwise.

Here is the traceback:

/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/array.py:4674: ZarrUserWarning: Automatic shard shape inference is experimental and may change without notice.
  shard_shape_parsed, chunk_shape_parsed = _auto_partition(
/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/dtype/npy/structured.py:318: UnstableSpecificationWarning: The data type (Structured(fields=(('PyvCr', FixedLengthUTF32(length=4, endianness='little')), ('UWJNo', FixedLengthUTF32(length=4, endianness='little'))))) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/dtype/npy/string.py:249: UnstableSpecificationWarning: The data type (FixedLengthUTF32(length=4, endianness='little')) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
Traceback (most recent call last):
  File "/Users/ilangold/Projects/Theis/anndata/new_tester.py", line 61, in <module>
    f[...] = arr
    ~^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/array.py", line 2966, in __setitem__
    self.set_basic_selection(cast("BasicSelection", pure_selection), value, fields=fields)
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/array.py", line 3200, in set_basic_selection
    sync(self._async_array._set_selection(indexer, value, fields=fields, prototype=prototype))
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/sync.py", line 159, in sync
    raise return_result
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/sync.py", line 119, in _runner
    return await coro
           ^^^^^^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/array.py", line 1735, in _set_selection
    await self.codec_pipeline.write(
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/codec_pipeline.py", line 486, in write
    await concurrent_map(
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/common.py", line 100, in concurrent_map
    return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/common.py", line 98, in run
    return await func(*item)
           ^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/codec_pipeline.py", line 352, in write_batch
    await self.encode_partial_batch(
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/codec_pipeline.py", line 247, in encode_partial_batch
    await self.array_bytes_codec.encode_partial(batch_info)
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/abc/codec.py", line 265, in encode_partial
    await concurrent_map(
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/common.py", line 100, in concurrent_map
    return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/common.py", line 98, in run
    return await func(*item)
           ^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/codecs/sharding.py", line 603, in _encode_partial_single
    chunks_per_shard = self._get_chunks_per_shard(shard_spec)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 3, in __hash__
TypeError: unhashable type: 'writeable void-scalar'

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
#   "numpy",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
from __future__ import annotations

import numpy as np
import zarr

# your reproducer code
zarr.print_debug_info()

arr = np.rec.array([('sQF', 'SQC'), ('XVut', 'XNsc'), ('HBz', 'xRL'),
           ('fuf', 'pyld'), ('Osuh', 'tRF'), ('PIpC', 'zzN'),
           ('YDyZ', 'MlJ'), ('RnG', 'PdF'), ('AHQ', 'uSc'),
           ('sRh', 'spmy')],
          dtype=[('btHIM', '<U4'), ('HLuXc', '<U4')])

g = zarr.open("foo.zarr", mode="w")
f = g.create_array("rec", shape=arr.shape, dtype=arr.dtype, shards="auto")
f[...] = arr

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions