So I was thinking that I would convert a list or array of indices into a boolean array internally. The internal logic that figures out which chunks are in range for the selection, and then iterates through the chunks extracting the necessary pieces of data, would then only need to consider boolean arrays (as well as slices and single indices). Relevant code is in Array.__getitem__() and Array._chunk_getitem(). I think that should be reasonably straightforward to implement. I don't think it's worth implementing support for indexing multiple dimensions with boolean or index arrays (i.e., two or more of the args passed into Array.__getitem__() are boolean or index arrays) because this gets really complicated, both to implement and from the user point of view. The numpy way to index rows and columns simultaneously in a 2D array is really painful I find as a user. I actually find the pandas-style .iloc indexing semantics much nicer for 2D arrays and would consider implementing them before doing the full numpy fancy indexing thing. But that could be a separate piece of work anyway. Re handling an unordered list of indices, if that could work as a post-processing step after retrieving all data out of chunks then it sounds worth exploring. The main thing is that __getitem__ works chunk-by-chunk through the underlying data, and so we need to avoid anything that requires visiting the same chunk more than once or visiting chunks in a funny order. FWIW I'd also be happy at least for a first step to raise an error if the user provides a list of indices with duplicates and/or out of order. This all sounds interesting and I'm very up for pursuing. However I should say that the code we are touching here is the only bit of Zarr that is at all complicated and required some mental effort to get right, so it would probably be a good idea to break the problem down and proceed incrementally in small steps, starting with the simplest cases.

…

On Thursday, December 1, 2016, jakirkham ***@***.***> wrote: So I'm trying to think of the best way to reformat indices of this sort into an easier to work with form per our earlier discussion <https://github.com/alimanfoo/zarr/issues/93#issuecomment-264121815>. I have other code library that would be pretty easy to move into kenjutsu. This code converts a list of indices into a bool array. Though it would be pretty easy to also handle arrays of indices too. This works with arbitrary numbers of dimensions, but certainly could be used in only 1D cases. Also I have some code that could be refactored out that takes an unordered list of indices and orders it, but also provides an additional set of indices to map the ordered indices back into the intended ordering. It works pretty nicely with NumPy arrays and could be handy here. Doesn't handle arrays of indices, but could look into that. Does any of this sound interesting? Do you have some other ideas about how you would like to work with these cases? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/alimanfoo/zarr/issues/78#issuecomment-264193730>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8QpLR3fMVgYNCsMuWm8FLvN-vQkY3ks5rDt_1gaJpZM4KE2cQ> .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health <http://cggh.org> The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

Uh oh!

Advanced indexing #78

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions