-
-
Notifications
You must be signed in to change notification settings - Fork 330
Advanced indexing #172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced indexing #172
Conversation
Some examples of usage, and some performance benchmarking, are in this notebook. Regarding the API, I have added support for orthogonal indexing (a.k.a. outer indexing) via (1) Keep this as-is, i.e., implement orthogonal indexing via (2) Keep behaviour of Regarding performance, results from some simple benchmarks look quite promising. Performance obviously depends on how many items are being selected, i.e., how dense or sparse the selection is. For relatively dense selections (~50% of items), indexing with a boolean array within a factor of 2 of speed for same operation on a plain numpy array, which seems decent given that zarr has to do the extra work of managing and decompressing chunks. For relatively sparse selections (~0.01% of items) we are about 10 times slower than numpy, but almost all the time is being spent in Array._decode_chunk which is where decompression happens, so I think this proves the overhead from processing the array selection is minimal compared with the time required for decompressing chunks, even when using a very fast compressor (Blosc with LZ4, multithreaded). I also did a quick performance comparison with h5py, which isn't really fair as h5py was using a slower compressor (gzip level 1), however FWIW, with the sparse boolean array zarr is ~4X faster than h5py, and with the dense boolean array h5py performance is pathological taking longer than 1 minute to complete, so zarr wins big there taking <1 second. Comments on API and implementation very welcome. cc @shoyer, @mrocklin, @jakirkham, @FrancescAlted. |
Very cool to see this!
Watch out: NumPy considers even scalars to be indexing arrays:
(This is my favorite NumPy indexing edge case.) I don't really have a opinion here on (1) vs (2), as long as it is clearly documented and you don't try to do both outer/orthogonal and vectorized/broadcasting indexing in the same API. NetCDF4-Python only does outer indexing and that works fine for it. I would be just as happy to use a special
This might actually be easier than you think. @mrocklin wrote a version of this for dask that might be a good reference point: |
Good work! Maybe I'm looking at the benchmarks incorrectly, but I only see zarr being 4x faster (not 10x) than h5py:
vs
For what is worth, I think zarr might benefit with the forthcoming introduction of dictionaries support for zstd inside Blosc2. The nice thing about dictionaries is that you can make your data blocks ridiculously small (apparently up to 1 KB), but still get good compression ratios and more importantly, very fast decompression speed. This should reduce the latency quite a bit when you have to decompress a whole block for getting just 1 (or a few) values out of it. |
Watch out: NumPy considers even scalars to be indexing arrays:
In [15]: x = np.zeros((1, 2, 3))
In [16]: x[0, :, [0, 1, 2]].shape
Out[16]: (3, 2)
(This is my favorite NumPy indexing edge case.)
Ouch. OK, maybe option (2) should be: allow only slices and/or ints in
__getitem__/__setitem__; implement orthogonal indexing via .oindex[] in
this PR; implement point selection via .vindex[] in future PR.
This might actually be easier than you think. @mrocklin
<https://github.com/mrocklin> wrote a version of this for dask that might
be a good reference point:
https://github.com/dask/dask/blob/7113a3c9bf335f2fe58989760af7b6
71d940e92f/dask/array/core.py#L3024
|
Sorry, yes, my mistake, 4X faster.
Very interesting, thanks! |
@mrocklin regarding API, how would/should this play with da.from_array(fancy=True/False)? If fancy=True, what does dask assume about the API? |
I believe that setting It's has been a while since then though, so I may be misremembering things. The relevant docstring is here
It sounds like you do support these things, so presumably people loading dask arrays from zarr arrays should set |
Thanks @mrocklin. Currently in this PR zarr does not implement fancy indexing same as numpy, but rather implements orthogonal indexing. So I was concerned dask may get unexpected results if fancy=True assumes numpy fancy indexing, depending on what indexes are passed through. Actually just looking at @shoyer favourite edge case, it looks like dask In [17]: x = np.arange(6).reshape(1, 2, 3)
In [18]: d = da.from_array(x, chunks=(1, 2, 3))
In [19]: x[0, :, [0, 1, 2]]
Out[19]:
array([[0, 3],
[1, 4],
[2, 5]])
In [20]: d[0, :, [0, 1, 2]].compute()
Out[20]:
array([[0, 1, 2],
[3, 4, 5]]) So I guess there are a couple of separate questions: (a) It looks like dask.array (b) What exactly does fancy=True mean in terms of expected behaviour of |
cc @benjeffery |
I think I prefer option (2): allow only slices and/or ints in |
Just to note, the way we currently disambiguate vectorized/orthogonal indexing internally in xarray is that we use dedicated classes to store each type of indexer: This way, indexing can go through the same code paths but still dispatch to appropriate backend specific methods (e.g., dask vs numpy vs netCDF4 vs zarr). |
Yes, I think that this came up when we were hammering out slicing. I think that it was intentionally decided to deviate from NumPy's behavior. I wouldn't be surprised if @shoyer was the one to make this call actually. My memory here is a bit hazy. |
I don't recall agreeing intentionally deviating from numpy for dask. But I
did notice this recently and hadn't gotten around to filing a bug yet. Dask
is at least in good company here: h5py also gets this wrong.
…On Wed, Nov 1, 2017 at 6:21 AM Matthew Rocklin ***@***.***> wrote:
Actually just looking at @shoyer <https://github.com/shoyer> favourite
edge case, it looks like dask *getitem* behaviour does something
different from numpy fancy indexing anyway, before even worrying about zarr
interaction. E.g.:
Yes, I think that this came up when we were hammering out slicing. I think
that it was intentionally decided to deviate from NumPy's behavior. I
wouldn't be surprised if @shoyer <https://github.com/shoyer> was the one
to make this call actually. My memory here is a bit hazy.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/pull/172#issuecomment-341104126>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1lMfX_zsaVNOHKUIHiAHoiZjw9hmks5syHBBgaJpZM4QMBwG>
.
|
OK, my mistake. I must be mis-remembering things.
On Wed, Nov 1, 2017 at 11:22 AM, Stephan Hoyer <[email protected]>
wrote:
… I don't recall agreeing intentionally deviating from numpy for dask. But I
did notice this recently and hadn't gotten around to filing a bug yet. Dask
is at least in good company here: h5py also gets this wrong.
On Wed, Nov 1, 2017 at 6:21 AM Matthew Rocklin ***@***.***>
wrote:
> Actually just looking at @shoyer <https://github.com/shoyer> favourite
> edge case, it looks like dask *getitem* behaviour does something
> different from numpy fancy indexing anyway, before even worrying about
zarr
> interaction. E.g.:
>
> Yes, I think that this came up when we were hammering out slicing. I
think
> that it was intentionally decided to deviate from NumPy's behavior. I
> wouldn't be surprised if @shoyer <https://github.com/shoyer> was the one
> to make this call actually. My memory here is a bit hazy.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/alimanfoo/zarr/pull/172#issuecomment-341104126>, or
mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABKS1lMfX_
zsaVNOHKUIHiAHoiZjw9hmks5syHBBgaJpZM4QMBwG>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/pull/172#issuecomment-341138751>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszJw4Pm3V44IVtmCAhcgEotjVFn3Gks5syIy2gaJpZM4QMBwG>
.
|
Hm, this is tricky.
For my part, there are three indexing use cases which I think are worth
considering.
The first is orthogonal indexing within any combination of int, slice,
ellipsis, 1d int array or 1d bool array. That's the one I need most often,
and is currently implemented in this PR via __getitem__.
The second is point selection with any combination of int and 1d int array
but no slice or ellipsis. The indexers can then be broadcast to provide
fully specified coordinate arrays and the output is always 1d. I think this
is essentially what is implemented in dask via .vindex and in xarray via
.isel_points(), and I think I can see how this could be done efficiently in
zarr.
The third is point selection with a single boolean array, where the boolean
indexer has the same shape as the array being indexed (generalization of
x[x > 0]). This maps to point selection via integer arrays by doing
np.nonzero() on the boolean indexer array.
Any other type of point selection/vectorized indexing (e.g., including
slices, int arrays with >1d, mixture of int and bool arrays, ...) I have
never needed and have trouble with understanding, mainly because dimensions
can get moved around in a way that I don't fully understand.
In this PR I think I'll change to limit __getitem__ to basic indexing with
int, slice and ellipsis only. I may also expose this functionality via a
couple of methods, something like get_basic_selection(selection, out=None)
and set_basic_selection(selection, value), where selection is a fully
specified tuple of ints and/or slices (i.e., no ellipsis, all dims are
indexed). This is partly for clarity, but also get_basic_selection() could
accept an "out" param, which would allow the selected data to be loaded
directly into an array given by the user, which could be numpy or another
zarr array.
I think I'll then expose orthogonal indexing via .oindex[] as has been
proposed for numpy. I may also implement this via methods, e.g.,
get_orthogonal_selection(selection, out=None) and
set_orthogonal_selection(selection, value), again for clarity and
flexibility.
I will probably leave point selection for future work. However, I think I
would start by implementing methods separately targeting the second and
third use cases above. E.g., get_point_selection_int(selection, out=None),
where "selection" is fully specified tuple of ints and/or 1d int arrays,
and get_point_selection_bool(selection, out=None) where "selection" is
single bool array with same shape as indexed array. This would at least
simplify implementation and make it a bit clearer what subset of point
selection indexing is being implemented. Both of these could then be used
to provide implementations for a subset of the proposed functionality for
.vindex[] in numpy, if that goes forwards.
On Wed, Nov 1, 2017 at 3:23 PM, Matthew Rocklin <[email protected]>
wrote:
… OK, my mistake. I must be mis-remembering things.
On Wed, Nov 1, 2017 at 11:22 AM, Stephan Hoyer ***@***.***>
wrote:
> I don't recall agreeing intentionally deviating from numpy for dask. But
I
> did notice this recently and hadn't gotten around to filing a bug yet.
Dask
> is at least in good company here: h5py also gets this wrong.
>
> On Wed, Nov 1, 2017 at 6:21 AM Matthew Rocklin ***@***.***
>
> wrote:
>
> > Actually just looking at @shoyer <https://github.com/shoyer> favourite
> > edge case, it looks like dask *getitem* behaviour does something
> > different from numpy fancy indexing anyway, before even worrying about
> zarr
> > interaction. E.g.:
> >
> > Yes, I think that this came up when we were hammering out slicing. I
> think
> > that it was intentionally decided to deviate from NumPy's behavior. I
> > wouldn't be surprised if @shoyer <https://github.com/shoyer> was the
one
> > to make this call actually. My memory here is a bit hazy.
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <https://github.com/alimanfoo/zarr/pull/172#issuecomment-341104126>,
or
> mute
> > the thread
> > <https://github.com/notifications/unsubscribe-auth/ABKS1lMfX_
> zsaVNOHKUIHiAHoiZjw9hmks5syHBBgaJpZM4QMBwG>
>
> > .
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/alimanfoo/zarr/pull/172#issuecomment-341138751>, or
mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/
AASszJw4Pm3V44IVtmCAhcgEotjVFn3Gks5syIy2gaJpZM4QMBwG>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/pull/172#issuecomment-341139089>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qo-QXsC3I2Z-B9BdKSyodMfyK4Rzks5syIztgaJpZM4QMBwG>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: [email protected]
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
|
A better name for "get_point_selection_bool" could be "get_mask_selection", and "get_point_selection_int" could be better named "get_coordinate_selection". |
I've pushed some new work on this, here's a synopsis. Vectorized (inner) indexing I've added support for vectorized indexing using coordinate arrays (a.k.a. point selection), actually wasn't too hard to do. This functionality is available via Vectorized indexing using a Boolean mask array is also supported via More complicated vectorized indexing scenarios, e.g., mixing coordinate or mask arrays with slices, are currently not supported. The indexing coordinates do not have to be sorted in any particular order. Zarr shuffles the coordinates so they are grouped by their corresponding chunk, so that each chunk is processed once only. Orthogonal (outer) indexing Orthogonal indexing is supported via Integer arrays do not need to be sorted. Zarr shuffles the index values so they are grouped by their corresponding chunk, so that each chunk is processed once only. Slice with step > 1 Slice with step > 1 is now supported in Open questions What functionality should be available via For 1D arrays there is no ambiguity on how to process advanced selections, i.e., there is no difference between vectorized versus orthogonal indexing. So for convenience, if a Zarr array is 1D, currently For multi-dimensional arrays things are more complex, and so currently I restrict If anyone feels this isn't a good way to go, happy to discuss. Benchmarks Performance seems reasonable in all cases, can't see any obvious ways to improve. More examples and some benchmarking data are in this notebook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking very nice! Fancy indexing support might give zarr a decisive edge over HDF5 :).
zarr/core.py
Outdated
|
||
elif len(self._shape) == 1: | ||
# safe to do "fancy" indexing, no ambiguity | ||
return self.get_orthogonal_selection(selection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do vectorized indexing on 1D arrays, too, e.g.,
In [22]: a = np.arange(4)
In [23]: a[a.reshape(2, 2)]
Out[23]:
array([[0, 1],
[2, 3]])
More generally, I agree that it's unambiguous for 1D, but given the focus of zarr on N-dimensions I would be reluctant to add this shortcut. The special case feels like more trouble than it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yep I think your probably right. I've added support for vectorized indexing with multi-dimensional coordinate arrays, but have limited __getitem__
to basic selections only.
zarr/core.py
Outdated
not self._filters and \ | ||
((self._order == 'C' and dest.flags.c_contiguous) or | ||
(self._order == 'F' and dest.flags.f_contiguous)): | ||
if isinstance(out, np.ndarray) and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: PEP8 suggests using extra parentheses rather than explicit \
for line continuation. I think it looks a little cleaner, too.
zarr/indexing.py
Outdated
|
||
|
||
def is_integer(x): | ||
return isinstance(x, numbers.Integral) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure this catches numpy's signed and unsigned integer types -- missing those lead to issues in dask and xarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked this, looks OK:
In [5]: for t in int, np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64:
...: print(t, isinstance(t(42), numbers.Integral))
...:
<class 'int'> True
<class 'numpy.int8'> True
<class 'numpy.int16'> True
<class 'numpy.int32'> True
<class 'numpy.int64'> True
<class 'numpy.uint8'> True
<class 'numpy.uint16'> True
<class 'numpy.uint32'> True
<class 'numpy.uint64'> True
zarr/indexing.py
Outdated
|
||
|
||
def slice_to_range(s): | ||
return range(s.start, s.stop, 1 if s.step is None else s.step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use slice.indices()
instead to get start/stop/step (this is especially important for tricky cases like negative steps). You'll also need the size of the array dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, didn't know about that, nice.
zarr/indexing.py
Outdated
|
||
def oindex(a, selection): | ||
"""Implementation of orthogonal indexing with slices and ints.""" | ||
drop_axes = tuple([i for i, s in enumerate(selection) if isinstance(s, int)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, be careful assuming that all integer selections are native Python ints.
zarr/indexing.py
Outdated
# validation | ||
if not is_coordinate_selection(selection, array): | ||
# TODO refactor error messages for consistency | ||
raise IndexError('invalid coordinate selection') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add an informative error message here about slices, because assuredly somebody is going to try that. (For what it's worth, I agree that it's a good choice not to support them!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'll do some work on error messages when implementation has settled.
zarr/indexing.py
Outdated
for dim_sel, dim_len in zip(selection, array.shape): | ||
|
||
# check number of dimensions, only support indexing with 1d array | ||
if len(dim_sel.shape) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm reading this right, but does this mean you only support vectorized indexing with 1D arrays?
Vectorized indexing with >1D arrays should be pretty easy and can be quite useful. You just need to flatten the indices after broadcasting and unflatten the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this tip, I finally get coordinate indexing (at least without slices)! I've added support for multi-dimensional coordinate arrays.
assert_array_equal(a[0], z[0]) | ||
assert_array_equal(a[-1], z[-1]) | ||
assert_array_equal(a[:, 0], z[:, 0]) | ||
assert_array_equal(a[:, -1], z[:, -1]) | ||
eq(a[0, 0], z[0, 0]) | ||
eq(a[-1, -1], z[-1, -1]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly recommend adding some short-form cases for vectorized indexing (i.e., with .vindex
). You have partial test coverage for this already, but there are some many indexing edge cases that it's a good idea to write them in the most succinct way possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some tests to cover these cases. Still a bit more coverage needed.
zarr/tests/test_indexing.py
Outdated
slice(50, 150, 1), | ||
slice(50, 150, 10), | ||
slice(50, 150, 100), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about negative steps? At the least, those should give an appropriate error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative steps are supported, I've added tests to confirm.
Thank you @shoyer for the hugely useful feedback. Here's a summary of latest pushes: Support has been added for coordinate indexing with multi-dimensional arrays. For arrays with a structured dtype, all I've also simplified Examples and benchmarks notebook has been updated for the above changes. |
Just to mention I've reworked the implementation of slices with step > 1, these are now supported via Test coverage is also back up, and I think I'm done with the main implementation work, so will work on docs and improving error messages before merging. |
Nice job. After having a look at your benchmarks, I see that Also, I see that |
Nice job. After having a look at your benchmarks, I see that
_chunk_getitem is usually the most consuming function (cumtime wise) in
your profiles, so I am wondering if that could be improved somehow. I see
that your chunksizes are typically between 256 KB to 1 MB, but the
benchmark page does not show the blocksize per every chunk, which is the
important parameter when you try to get a handful of values out of a chunk
(only a block or a few need to be decompressed). You could get such
blocksize parameter by using the blosc_get_blocksize() call, and you can
explicitly set it using blosc_set_blocksize() (if you don't call it, an
automatic blocksize is used). You may want to add support to these
functions in zarr and try a smaller blocksize to see how it would affect
your current figures.
Thanks Francesc. In fact the blocksize parameter is exposed for the
numcodecs.Blosc compressor, so this could be tuned. But the main thing for
the indexing work is to know that the compressor is the limiting factor.
The Blosc compressor is already extremely fast, and although could probably
be tuned even further, it is already very useful to know that my indexing
implementation is not getting in the way performance-wise, at least for
most of the indexing operations.
(Btw when I was running the benchmarks yesterday I could actually hear my
computer audibly fizzing, only Blosc can make it do that :-)
Also, I see that np.argsort() shows up sometimes the first in time usage.
I am wondering if you could make use of a handy keysort
<https://github.com/PyTables/PyTables/blob/6782047b9223897fd59ff4967d71d7fdfb474f16/tables/indexesextension.pyx#L147>
that I did many years ago. keysort() takes two arrays as arguments,
sorting in-place the first one and also the second, but following the order
of the first, in one shot. This requires less temporaries and hence it is
quite more efficient than an np.argsort followed by an indexing
operation. It has been in production in PyTables for years, so it should be
safe enough.
Thank you, actually I did do some trawling the internet to see if there was
any way to accelerate the argsort and saw a mention that you had done
something like this in pytables. I think it would be great to explore this,
I noticed that sorting an array in place takes less than half the time of
an argsort followed by indexing when just using numpy. I'm a bit hesitant
to do include cython code in zarr as currently zarr is pure Python, and
there are some benefits to not having any cython code to build. What would
be cool is if the pytables keysort implementation was available as a
standalone package, then zarr could depend on it. But I know you're
super-busy so don't want to ask you to do any more :-)
|
Ha ha, this probably has to do with the SIMD support in blosc shuffle/bitshuffle, which makes CPUs consume quite more energy. Add multithreading to the equation and yeah, I can imagine you could fry something on top of your CPU while you are at it :) |
6a38007
to
f39bc40
Compare
OK, I think I am done here. There is a new tutorial section on advanced indexing. Error messages have been improved. I'll let the dust settle for a few days. |
03176cf
to
4e19759
Compare
…r into advanced-indexing-20171028
After rebasing, I hit this unicode weirdness on Windows: >>> import numpy as np
>>> v = np.array('xxx', dtype='U3')[()]
>>> v
'xxx'
>>> a = np.empty(10, dtype='U3')
>>> a[:] = v
>>> a[0] == v
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000) I think this is related to default locale of cp1252 on my Windows VM, but I don't really understand what's happening. In any case I've pushed a simple workaround. |
Alright, merging. |
This PR adds support for indexing Zarr arrays with Boolean or integer arrays. Resolves #78. Also adds support for selecting fields from structured array (resolves #112). Also resolves #89, resolves #93.
TODO: