Skip to content

open...() functions should accept pathlib.Path #261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alimanfoo opened this issue May 10, 2018 · 14 comments
Closed

open...() functions should accept pathlib.Path #261

alimanfoo opened this issue May 10, 2018 · 14 comments

Comments

@alimanfoo
Copy link
Member

Minimal, reproducible code sample, a copy-pastable example if possible

from pathlib import Path
store = Path('/path/to/store')
zarr.open_group(store)

...gives:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-4a4272aec66c> in <module>()
----> 1 codon_position = zarr.open_group(output_dir / 'codon_position.zarr.zip')
      2 codon_position.tree()

~/github/wtsi-team112/vector-ops/deps/conda/envs/vector-ops/lib/python3.6/site-packages/zarr/hierarchy.py in open_group(store, mode, cache_attrs, synchronizer, path)
   1117 
   1118     elif mode == 'a':
-> 1119         if contains_array(store, path=path):
   1120             err_contains_array(path)
   1121         if not contains_group(store, path=path):

~/github/wtsi-team112/vector-ops/deps/conda/envs/vector-ops/lib/python3.6/site-packages/zarr/storage.py in contains_array(store, path)
     70     prefix = _path_to_prefix(path)
     71     key = prefix + array_meta_key
---> 72     return key in store
     73 
     74 

TypeError: argument of type 'PosixPath' is not iterable

Problem description

Functions that accept a string path for the store argument should also accept Path object.

Version and installation information

Please provide the following:

  • Value of zarr.__version__: 2.2.0
  • Version of Python interpreter: 3.6
@alimanfoo alimanfoo added this to the v2.3 milestone Oct 19, 2018
@jakirkham
Copy link
Member

Imagine @martindurant has thoughts on this one. 😉

@martindurant
Copy link
Member

Yeah...
Having filenames as a possibility is a fairly natural convenience, but the real input for zarr functionality is a key-value mapper over some file-system.

Maybe I'm just too old for pathlib, but wouldn't it be easier for the user to coerce the object to str in the line passing it to zarr rather than zarr having to check for another condition?

@nehz
Copy link

nehz commented Aug 12, 2019

Passing pathlib paths around is becoming more common, at least the error message should be something more sane I think (e.g TypeError)

@martindurant
Copy link
Member

Note that fsspec does support Path objects now, except they don't really make sense for anything non-local. From zarr's point of view, a Path should probably be assumed to be local, and so invoke the directory-store directly rather than going via fsspec.

@alimanfoo alimanfoo removed this from the v2.4 milestone Jun 26, 2020
@LunarLanding
Copy link

Most software I've been using accepts path objects. Would a minimal pull request be accepted?

@hungyiwu
Copy link

hungyiwu commented Aug 4, 2020

I ran into this issue but got a less intuitive error message:

minimal example

from pathlib import Path
import zarr
p = Path("./test.zarr")
zarr.open(p, mode="w")

error message

Traceback (most recent call last):
  File "run.py", line 4, in <module>
    zarr.open(p, mode="w")
  File "/home/hungyiwu/Desktop/zarr_report/demo/lib/python3.8/site-packages/zarr/convenience.py", line 84, in open
    return open_group(store, mode=mode, **kwargs)
  File "/home/hungyiwu/Desktop/zarr_report/demo/lib/python3.8/site-packages/zarr/hierarchy.py", line 1149, in open_group
    init_group(store, overwrite=True, path=path, chunk_store=chunk_store)
  File "/home/hungyiwu/Desktop/zarr_report/demo/lib/python3.8/site-packages/zarr/storage.py", line 441, in init_group
    _init_group_metadata(store=store, overwrite=overwrite, path=path,
  File "/home/hungyiwu/Desktop/zarr_report/demo/lib/python3.8/site-packages/zarr/storage.py", line 450, in _init_group_metadata
    rmdir(store, path)
  File "/home/hungyiwu/Desktop/zarr_report/demo/lib/python3.8/site-packages/zarr/storage.py", line 118, in rmdir
    store.rmdir(path)
TypeError: rmdir() takes 1 positional argument but 2 were given

pip freeze

asciitree==0.3.3
fasteners==0.15
monotonic==1.5
numcodecs==0.6.4
numpy==1.19.1
six==1.15.0
zarr==2.4.0

@martindurant
Copy link
Member

It's not obvious to me that accepting Path is useful (but I don't use pathlib myself). It is important to note that the path supplied could, in principle, be a remote or other complex path, of the type handles by fsspec, which cannot be represented by Path. The first line of each open functin would then read if isinstance(path, Path):, which seems needless to me, when we can simply put the onus on the calling code.

@hungyiwu
Copy link

hungyiwu commented Aug 4, 2020

@martindurant Thank you for the clarification! I fully understand that accepting a Path object is not strictly necessary. I'd love to have at least an informative error message as the current one is not informative to me at all. I realized this the hard way by recalling some libraries I used had this issue (unclear error message and turned out solved by casting Path object to str) and the searched the issue tab, which led me here. I believe an informative error message will greatly reduce the debugging time for people like me.

What do you think?

@martindurant
Copy link
Member

I see things like

store : MutableMapping or string, optional
    Store or path to directory in file system or name of zip file.

In general, if you submit arguments with unexpected types to any function in python, strange things might happen. I would think it inappropriate to check the types of the inputs, which would have to be done in every location. Of course, the caveat still applies: I don't use pathlib, so I don't see the benefit (my interior model must predate it).

@hungyiwu
Copy link

hungyiwu commented Aug 4, 2020

@martindurant I see your point. I kind of assume that when a path argument accepts str it automatically accepts Path object, as is my experience with many other libraries. I think you have a point and I'll close my PR accordingly.

@Carreau
Copy link
Contributor

Carreau commented Aug 4, 2020

I think it could make sens that if we receive a Path, then it should be opened with the directory store. Pathlib is more and more common, and I would strongly recommend it when you know you are working with Path inside libraries; I've personally started to convert most of the libraries I use that take string to work with Path internally, and I'm much happier.

We should also consider that Path might be coming from higher up, like someone may be using xarray.open(path), and get an error in zarr. Not supporting Path, mean that each of the downstream consumer need to make sure they convert things to str.

@martindurant
Copy link
Member

I xarray were to support Path (do they?), I would expect that to be done with

if isinstance(path, Path):
    path = str(path)

and we would just be repeating this.

@clbarnes clbarnes mentioned this issue Jun 4, 2021
6 tasks
@clbarnes
Copy link
Contributor

clbarnes commented Sep 1, 2021

Basic support was released in 2.9.0: https://zarr.readthedocs.io/en/stable/release.html#enhancements

I think this can be closed unless there's desire to base more of the internals on Paths.

@joshmoore
Copy link
Member

Thanks, @clbarnes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants