Skip to content

Referencing a panel with an object causes an inf loop in 0.15.2 #9140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
echu79 opened this issue Dec 23, 2014 · 7 comments · Fixed by #9143
Closed

Referencing a panel with an object causes an inf loop in 0.15.2 #9140

echu79 opened this issue Dec 23, 2014 · 7 comments · Fixed by #9143
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@echu79
Copy link

echu79 commented Dec 23, 2014

The snippet of code works fine in 0.15.0. Not sure about 0.15.1 as I did not try it on that version. However on 0.15.2 it causes an infinite loop

import pandas
import numpy

class TestObject:
        def __str__(self):
                return "TestObject"

obj = TestObject()

wp = pandas.Panel(numpy.random.randn(1,5,4), items=[obj],
        major_axis = pandas.date_range('1/1/2000', periods=5),
        minor_axis=['A', 'B', 'C', 'D'])


print(wp)

print(wp[obj])
@jreback
Copy link
Contributor

jreback commented Dec 24, 2014

so I changed this is #8710. Guess the semantics of the scalar and not list-like are slightly different w.r.t. to an object-like.

Not really sure what guarantees exists for this (e.g. it doesn't necessarily have to be stringifiable, but should be hashable (which this example object is not), though maybe even that is too strict.

Any thoughts on guarantees on indexables?
cc @dalejung
@shoyer
cc @immerrr

(to be clear we accept things like: scalars, numpy arrays (or the correct ndim), lists, indexes, slices, boolean arrays, int indexers, and prob a few others I am forgetting).

fixed here:
#9143

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Dec 24, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 24, 2014
@shoyer
Copy link
Member

shoyer commented Dec 24, 2014

Actually, this object is hashable (Python defaults to implementing __hash__ based on identity for objects that don't implement __eq__):

In [4]: hash(TestObject())
Out[4]: 274153109

I don't know if we can make general guarantees on how we handle non-standard indexers, but generally we should be able to handle hashable items that are not sequences unambiguously. If something is both hashable and a sequence (e.g., tuple-like or an immutable ndarray), well, then all bets are off.

@immerrr
Copy link
Contributor

immerrr commented Dec 24, 2014

Scalar lookup key (i.e. single label) restrictions are imposed by lookup implementation, in a world of static typing one would define an interface for all scalar lookup keys to implement. AFAIR, IndexEngine uses hashtable, so it requires labels to be hashable and equality-comparable, but it can also fall back to searchsorted, so it should also require them to be orderable. If those are missed, things will blow up.

There's also a subtle restriction that I often miss coming from C/C++: lookup keys should also be immutable (in C there's a const specifier to ensure that). It will not blow up (raise exceptions), but changing an element after it has been put into a hash table may break that table's integrity and cause all kinds of undefined behaviours.

Obviously, integer and boolean values fit this definition of scalar lookup keys (you can have a hash table of ints or bools, bool lookup is weird, but possible), so there's no need to state their support explicitly.

After scalar lookup key is defined, slices, lists, arrays, etc of those are simply a containers of such keys (let's think of slice as a tuple with special meaning for now) and should not add any other restrictions.

User-defined types make it a bit complicated creating the ambiguity of whether a type is a container or a scalar key, it would be nice to probably register abstract classes so that users can resolve the ambiguity, or do some auto-detection and add a wrapper type so that users can override it. In any case, pandas has lived without this for ages interpreting all unknown types as sequences, so maybe it's not a big deal.

@shoyer
Copy link
Member

shoyer commented Dec 24, 2014

One minor elaboration on @immerrr's excellent comment: according to best practices, mutable objects should not be hashable in Python, so hashable should imply immutable. That said, this being Python, nothing guarantees this, but if users are using pathological classes like this it's their own fault if anything breaks.

@immerrr
Copy link
Contributor

immerrr commented Dec 24, 2014

@shoyer, I can't believe I have missed that immutability of hash value during the lifetime of an object is a prerequisite for that object to be "hashable" according to https://docs.python.org/2/glossary.html :)

@dalejung
Copy link
Contributor

The issue with default user-defined hashes is that they are based off of id and won't survive pickling. It's easy for them not to match from a random reload. I suppose we could require that user objects define __hash__ and assume those hashes are stable.

I'm ambivalent about the ambigious case. As long as it's consistently defaulted one way and we provide an override.

@jreback
Copy link
Contributor

jreback commented Dec 24, 2014

thanks for the comments. If anyone would like to make a little writeup for the internals doc would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants