Referencing a panel with an object causes an inf loop in 0.15.2 #9140

echu79 · 2014-12-23T18:09:11Z

The snippet of code works fine in 0.15.0. Not sure about 0.15.1 as I did not try it on that version. However on 0.15.2 it causes an infinite loop

import pandas
import numpy

class TestObject:
        def __str__(self):
                return "TestObject"

obj = TestObject()

wp = pandas.Panel(numpy.random.randn(1,5,4), items=[obj],
        major_axis = pandas.date_range('1/1/2000', periods=5),
        minor_axis=['A', 'B', 'C', 'D'])


print(wp)

print(wp[obj])

The text was updated successfully, but these errors were encountered:

jreback · 2014-12-24T02:07:51Z

so I changed this is #8710. Guess the semantics of the scalar and not list-like are slightly different w.r.t. to an object-like.

Not really sure what guarantees exists for this (e.g. it doesn't necessarily have to be stringifiable, but should be hashable (which this example object is not), though maybe even that is too strict.

Any thoughts on guarantees on indexables?
cc @dalejung
@shoyer
cc @immerrr

(to be clear we accept things like: scalars, numpy arrays (or the correct ndim), lists, indexes, slices, boolean arrays, int indexers, and prob a few others I am forgetting).

fixed here:
#9143

shoyer · 2014-12-24T06:44:39Z

Actually, this object is hashable (Python defaults to implementing __hash__ based on identity for objects that don't implement __eq__):

In [4]: hash(TestObject())
Out[4]: 274153109

I don't know if we can make general guarantees on how we handle non-standard indexers, but generally we should be able to handle hashable items that are not sequences unambiguously. If something is both hashable and a sequence (e.g., tuple-like or an immutable ndarray), well, then all bets are off.

immerrr · 2014-12-24T07:05:26Z

Scalar lookup key (i.e. single label) restrictions are imposed by lookup implementation, in a world of static typing one would define an interface for all scalar lookup keys to implement. AFAIR, IndexEngine uses hashtable, so it requires labels to be hashable and equality-comparable, but it can also fall back to searchsorted, so it should also require them to be orderable. If those are missed, things will blow up.

There's also a subtle restriction that I often miss coming from C/C++: lookup keys should also be immutable (in C there's a const specifier to ensure that). It will not blow up (raise exceptions), but changing an element after it has been put into a hash table may break that table's integrity and cause all kinds of undefined behaviours.

Obviously, integer and boolean values fit this definition of scalar lookup keys (you can have a hash table of ints or bools, bool lookup is weird, but possible), so there's no need to state their support explicitly.

After scalar lookup key is defined, slices, lists, arrays, etc of those are simply a containers of such keys (let's think of slice as a tuple with special meaning for now) and should not add any other restrictions.

User-defined types make it a bit complicated creating the ambiguity of whether a type is a container or a scalar key, it would be nice to probably register abstract classes so that users can resolve the ambiguity, or do some auto-detection and add a wrapper type so that users can override it. In any case, pandas has lived without this for ages interpreting all unknown types as sequences, so maybe it's not a big deal.

shoyer · 2014-12-24T07:15:39Z

One minor elaboration on @immerrr's excellent comment: according to best practices, mutable objects should not be hashable in Python, so hashable should imply immutable. That said, this being Python, nothing guarantees this, but if users are using pathological classes like this it's their own fault if anything breaks.

immerrr · 2014-12-24T07:25:22Z

@shoyer, I can't believe I have missed that immutability of hash value during the lifetime of an object is a prerequisite for that object to be "hashable" according to https://docs.python.org/2/glossary.html :)

dalejung · 2014-12-24T08:22:53Z

The issue with default user-defined hashes is that they are based off of id and won't survive pickling. It's easy for them not to match from a random reload. I suppose we could require that user objects define __hash__ and assume those hashes are stable.

I'm ambivalent about the ambigious case. As long as it's consistently defaulted one way and we provide an override.

jreback · 2014-12-24T15:50:07Z

thanks for the comments. If anyone would like to make a little writeup for the internals doc would be great!

dalejung mentioned this issue Dec 24, 2014

Object indexes on the items/info axis of Panel #9142

Closed

jreback mentioned this issue Dec 24, 2014

BUG: Bug in Panel indexing with an object-like (GH9140) #9143

Merged

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Dec 24, 2014

jreback added this to the 0.16.0 milestone Dec 24, 2014

jreback closed this as completed in #9143 Dec 24, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Referencing a panel with an object causes an inf loop in 0.15.2 #9140

Referencing a panel with an object causes an inf loop in 0.15.2 #9140

echu79 commented Dec 23, 2014

jreback commented Dec 24, 2014

shoyer commented Dec 24, 2014

immerrr commented Dec 24, 2014

shoyer commented Dec 24, 2014

immerrr commented Dec 24, 2014

dalejung commented Dec 24, 2014

jreback commented Dec 24, 2014

Referencing a panel with an object causes an inf loop in 0.15.2 #9140

Referencing a panel with an object causes an inf loop in 0.15.2 #9140

Comments

echu79 commented Dec 23, 2014

jreback commented Dec 24, 2014

shoyer commented Dec 24, 2014

immerrr commented Dec 24, 2014

shoyer commented Dec 24, 2014

immerrr commented Dec 24, 2014

dalejung commented Dec 24, 2014

jreback commented Dec 24, 2014