Skip to content

No such file or directory when replacing an array #2892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
csparker247 opened this issue Mar 5, 2025 · 6 comments
Open

No such file or directory when replacing an array #2892

csparker247 opened this issue Mar 5, 2025 · 6 comments
Labels
bug Potential issues with the zarr-python library

Comments

@csparker247
Copy link

Zarr version

3.0.4

Numcodecs version

0.15.1

Python Version

3.12.8

Operating System

Mac

Installation

using pip to a venv

Description

I'm trying to replace (or create if it doesn't exist) an array that's inside an existing, local zarr group. Since the array I want to add may or may not be the same size as the existing array, I'm trying to delete the existing entry and create a new array fresh using del. Yes, I know I can check the existing shape to avoid an unnecessary deletion. Similar to #2334, the call to del raises a FileNotFoundError:

Traceback (most recent call last):
  File "/Volumes/REDACTED/test.py", line 13, in <module>
    del root['labels']
        ~~~~^^^^^^^^^^
  File "/Users/REDACTED/venv/lib/python3.12/site-packages/zarr/core/group.py", line 1900, in __delitem__
    self._sync(self._async_group.delitem(key))
  File "/Users/REDACTED/venv/lib/python3.12/site-packages/zarr/core/sync.py", line 208, in _sync
    return sync(
           ^^^^^
  File "/Users/REDACTED/venv/lib/python3.12/site-packages/zarr/core/sync.py", line 163, in sync
    raise return_result
  File "/Users/REDACTED/venv/lib/python3.12/site-packages/zarr/core/sync.py", line 119, in _runner
    return await coro
           ^^^^^^^^^^
  File "/Users/REDACTED/venv/lib/python3.12/site-packages/zarr/core/group.py", line 755, in delitem
    await store_path.delete_dir()
  File "/Users/REDACTED/venv/lib/python3.12/site-packages/zarr/storage/_common.py", line 161, in delete_dir
    await self.store.delete_dir(self.path)
  File "/Users/REDACTED/venv/lib/python3.12/site-packages/zarr/storage/_local.py", line 216, in delete_dir
    shutil.rmtree(path)
  File "/opt/homebrew/Cellar/[email protected]/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/shutil.py", line 759, in rmtree
    _rmtree_safe_fd(stack, onexc)
  File "/opt/homebrew/Cellar/[email protected]/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/shutil.py", line 703, in _rmtree_safe_fd
    onexc(func, path, err)
  File "/opt/homebrew/Cellar/[email protected]/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/shutil.py", line 700, in _rmtree_safe_fd
    onexc(os.unlink, fullname, err)
  File "/opt/homebrew/Cellar/[email protected]/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/shutil.py", line 698, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
FileNotFoundError: [Errno 2] No such file or directory: 'foo/01234/labels/c/1/0'

If I try to run my script multiple times after this failure, I get the following:

  1. Prints UserWarning: Object at labels is not recognized as a component of a Zarr hierarchy. but succeeds.
  2. Fails with the FileNotFound error.
  3. UserWarning...
  4. UserWarning...
  5. (and all attempts after this) Succeeds.

This leads me to believe that there is some race condition with the OS or filesystem, and eventually something starts caching the correct operations so that the deletion succeeds every time.

Steps to reproduce

If I run the following script with my full zarr, I get the errors as described. Unfortunately, if I try to create a new, minimal dataset from scratch with only one entry, this script does not reproduce the error.

import zarr
import numpy as np

# create a replacement array
labels = np.ones((47, 512, 512), np.uint8)

# open the zarr and select a sub-group
z = zarr.open('foo', mode='r+')
root = z['01234']

# delete the labels array
if 'labels' in root.keys():
    del root['labels']

# create and assign the new labels array
a = root.create_array('labels',
                      dtype=labels.dtype,
                      shape=labels.shape,
                      chunks=(12, 64, 64),
                      shards=(24, 512, 512),
                      config={'write_empty_chunks': False})
a[...] = labels

Additional output

No response

@csparker247 csparker247 added the bug Potential issues with the zarr-python library label Mar 5, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Mar 5, 2025

hi @csparker247, thanks for the bug report. It's possible that this was previously reported here. In any case, we definitely need to fix this. I don't have the bandwidth for it now but hopefully another member of the python dev team can look into it

@csparker247
Copy link
Author

Thanks. It doesn't look exactly like the same thing, but it certainly seems to be related behavior. I wasn't aware of the overwrite parameter, either, so now I'm curious if using it will give me the same errors...

@brokkoli71
Copy link
Member

I'll have a look at this

@brokkoli71
Copy link
Member

@csparker247 Unfortunately I was not able to reproduce your error. I created the following array before running your script:

g = zarr.create_group("foo")
g.create_group("01234").create_array("labels", shape=(512, 512, 512), chunks=(64, 64, 64), dtype=np.uint8)
g["01234"]["labels"][...] = np.ones((512, 512, 512), np.uint8)

This caused no problems. Could you provide more information about the creation/size of your array. Are you running the script multiple times in parallel, so that the race condition comes from trying to delete the array multiple times?

@csparker247
Copy link
Author

This caused no problems. Could you provide more information about the creation/size of your array. Are you running the script multiple times in parallel, so that the race condition comes from trying to delete the array multiple times?

My dataset is a zarr group containing 3120 sub-groups representing 3D datasets, some of which are labeled. Each sub-group has a data array and some of these additionally have a labels array. The data/labels array varies in length on the first axis (usually 30-60), but the last two axes are always (512, 512):

- foo
  - 00000
    - data (shape: [45, 512 , 512] dtype: uint16)
    - labels (shape: [45, 512 , 512] dtype: uint8)
  - ...
  - 03119
    - data (shape: [57, 512 , 512] dtype: uint16)
    - labels (shape: [57, 512 , 512] dtype: uint8)

I detected an issue with my label generation that affected the shape of the labels array, so I needed to delete the old labels array rather than just changing its value, hence the logic of the example script.

I was not running this in parallel. I simply ran the script on the command line, received the error, hit the up arrow to get the previous command, ran it again, received the error, etc.

@brokkoli71
Copy link
Member

brokkoli71 commented Apr 25, 2025

@csparker247
I think this error occurs because your filesystem is busy removing the large files, and when you run the script that triggers shutil.rmtree again, it tries to remove the same files again simultaneously. That causes the FileNotFound exception and also the UserWarning if some files have already been removed but others have not.

Please try running your delete script once when you are sure that your filesystem has finished modified/deleting files.

  • If this throws a FileNotFound exception, please see if the following script throws the same exception (I tried to mimic your data as closely as possible, but no errors occurred on my machine)
  • If not, it might be a good idea to throw a warning instead of a FileNotFound exception in case files are not found in shutil.rmtree. I'll create a PR for this.
path = ...

# recreate your array
g = zarr.create_group(path)
g.create_group("03119")
g["03119"].create_array("labels", shape=(57, 512, 512), chunks=(12, 64, 64), shards=(24, 512, 512), dtype=np.uint8)
g["03119"]["labels"][...] = np.full((57, 512, 512), 3, np.uint8)
g["03119"].create_array("data", shape=(57, 512, 512), chunks=(12, 64, 64), shards=(24, 512, 512), dtype=np.uint16)
g["03119"]["labels"][...] = np.full((57, 512, 512), 3, np.uint16)

# create a replacement array
labels = np.ones((47, 1, 512), np.uint8)

# open the zarr and select a sub-group
z = zarr.open(path, mode='r+')
root = z['03119']

# delete the labels array
if 'labels' in root.keys():
    del root['labels']

# create and assign the new labels array
a = root.create_array('labels',
                      dtype=labels.dtype,
                      shape=labels.shape,
                      chunks=(12, 64, 64),
                      shards=(24, 512, 512),
                      config={'write_empty_chunks': False})
a[...] = labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants