-
-
Notifications
You must be signed in to change notification settings - Fork 330
Advanced indexing #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
See also #77 |
So I'm trying to think of the best way to reformat indices of this sort into an easier to work with form per our earlier discussion. I have other code library that would be pretty easy to move into Also I have some code that could be refactored out that takes an unordered list of indices and orders it, but also provides an additional set of indices to map the ordered indices back into the intended ordering. It works pretty nicely with NumPy arrays and could be handy here. Doesn't handle arrays of indices, but could look into that. Does any of this sound interesting? Do you have some other ideas about how you would like to work with these cases? |
So I was thinking that I would convert a list or array of indices into a
boolean array internally. The internal logic that figures out which chunks
are in range for the selection, and then iterates through the chunks
extracting the necessary pieces of data, would then only need to consider
boolean arrays (as well as slices and single indices). Relevant code is in
Array.__getitem__() and Array._chunk_getitem(). I think that should be
reasonably straightforward to implement.
I don't think it's worth implementing support for indexing multiple
dimensions with boolean or index arrays (i.e., two or more of the args
passed into Array.__getitem__() are boolean or index arrays) because this
gets really complicated, both to implement and from the user point of view.
The numpy way to index rows and columns simultaneously in a 2D array is
really painful I find as a user. I actually find the pandas-style .iloc
indexing semantics much nicer for 2D arrays and would consider implementing
them before doing the full numpy fancy indexing thing. But that could be a
separate piece of work anyway.
Re handling an unordered list of indices, if that could work as a
post-processing step after retrieving all data out of chunks then it sounds
worth exploring. The main thing is that __getitem__ works chunk-by-chunk
through the underlying data, and so we need to avoid anything that requires
visiting the same chunk more than once or visiting chunks in a funny order.
FWIW I'd also be happy at least for a first step to raise an error if the
user provides a list of indices with duplicates and/or out of order.
This all sounds interesting and I'm very up for pursuing. However I should
say that the code we are touching here is the only bit of Zarr that is at
all complicated and required some mental effort to get right, so it would
probably be a good idea to break the problem down and proceed incrementally
in small steps, starting with the simplest cases.
…On Thursday, December 1, 2016, jakirkham ***@***.***> wrote:
So I'm trying to think of the best way to reformat indices of this sort
into an easier to work with form per our earlier discussion
<https://github.com/alimanfoo/zarr/issues/93#issuecomment-264121815>.
I have other code library that would be pretty easy to move into kenjutsu.
This code converts a list of indices into a bool array. Though it would
be pretty easy to also handle arrays of indices too. This works with
arbitrary numbers of dimensions, but certainly could be used in only 1D
cases.
Also I have some code that could be refactored out that takes an unordered
list of indices and orders it, but also provides an additional set of
indices to map the ordered indices back into the intended ordering. It
works pretty nicely with NumPy arrays and could be handy here. Doesn't
handle arrays of indices, but could look into that.
Does any of this sound interesting? Do you have some other ideas about how
you would like to work with these cases?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/78#issuecomment-264193730>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QpLR3fMVgYNCsMuWm8FLvN-vQkY3ks5rDt_1gaJpZM4KE2cQ>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Thanks for all the info. I've been giving this some thought and playing with some things on my end. Let me see if I can get something more concrete for us to discuss. |
Awesome.
…On Fri, Dec 9, 2016 at 3:46 PM, jakirkham ***@***.***> wrote:
Thanks for all the info. I've been giving this some thought and playing
with some things on my end. Let me see if I can get something more concrete
for us to discuss.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/78#issuecomment-266045539>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qh71suNE9-HdmZcshaA9Ogx0F7Hkks5rGXfCgaJpZM4KE2cQ>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
What about something where we break slices that have multiple indices into a set of slices with a single index each and iterate over them? Worked up something like this in PR ( jakirkham/kenjutsu#52 ), which I'm starting to play with and it seems to do alright. |
The 0.4.0 version of |
Thank you, I'll take a look asap.
…On Sat, Feb 18, 2017 at 4:10 PM, jakirkham ***@***.***> wrote:
The 0.4.0 version of kenjutsu (now on PyPI and conda-forge) should
provide you a way to solve the list of indices in one dimension problem.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/78#issuecomment-280855250>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QkWmTFH7bTz9zfr2Gw58O1DDdsbzks5rdxgNgaJpZM4KE2cQ>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
So PR ( https://github.com/alimanfoo/zarr/pull/116 ) implements this functionality using |
It would be fairly straightforward to add support for indexing a Zarr array with a boolean array or list/array of indices, provided that (1) only one dimension has a fancy index, and (2) if a list/array of indices, indices are given in increasing order.
The text was updated successfully, but these errors were encountered: