API: allow Series comparison ops to align before comparison (GH1134) #6860

jreback · 2014-04-10T13:22:02Z

reordered comparisons

In [1]:     s1 = Series(index=["A", "B", "C"], data=[1,2,3])

In [2]:      s1
Out[2]: 
A    1
B    2
C    3
dtype: int64

In [3]:      s2 = Series(index=["C", "B", "A"], data=[3,2,1])

In [4]:      s2
Out[4]: 
C    3
B    2
A    1
dtype: int64

In [5]:      s1 == s2
Out[5]: 
A    True
B    True
C    True
dtype: bool

Here we have a missing value, so it's nan in the comparisons

In [6]:      s3 = Series(index=["C", "B"], data=[3,2])

In [7]:      s3
Out[7]: 
C    3
B    2
dtype: int64

In [8]:      s1 == s3
Out[8]: 
A    False
B     True
C     True
dtype: bool

In [9]: s1>s3
Out[9]: 
A    False
B    False
C    False
dtype: bool

In [10]: s1<s3
Out[10]: 
A    False
B    False
C    False
dtype: bool

jreback · 2014-04-10T13:24:17Z

IIRC we discussed this ad nauseum before. Its more 'correct' for the missing values to return nan, (so the resulting Series is not boolean but object), and thus requires filling before doing indexing. So we are effectively filling in False here (when their are nans).

Furthermore, a reordered Series is really NOT equal.

jreback · 2014-04-12T00:44:38Z

cc @Komnomnomnom

This is from test_json/test_ujson/testSeries

This is a failing test with this PR, because before the values DO compare correctly if you didn't align the indexes. Aligning causes this to fail (as nothing matches up as 1 is Int64, the other object).

This DOES look correct though as deserializing does not guarantee that something that looks like a numerical index is actually numerical, right? (except for DatetimeIndex and we have a separate kw arg fo that).

right?

In [1]: s = Series([10, 20, 30, 40, 50, 60], name="series", index=[6,7,8,9,10,15])

In [2]: s.sort()

In [3]: import pandas.json as ujson

In [4]: outp = Series(ujson.decode(ujson.encode(s)))

In [6]: outp.sort()

In [7]: outp
Out[7]: 
6     10
7     20
8     30
9     40
10    50
15    60
dtype: int64

In [8]: outp.index
Out[8]: Index([u'6', u'7', u'8', u'9', u'10', u'15'], dtype='object')

…risons

Komnomnomnom · 2014-04-12T02:35:57Z

Yeah the problem with JSON is keys must be strings so when you read them back you really have no idea without doing some guesswork (which is what read_json does after calling decode / loads).

The lower level 'decode' method which this is testing gives you a string index back, which didn't matter during comparison before as there was no alignment happening. Your fix looks good, although it would work just as well I think to change the test Series to have a string index from the start e.g.

In [20]: s = Series([10, 20, 30, 40, 50, 60], name="series", index=[str(s) for s in [6,7,8,9,10,15]])
In [26]: Series(ujson.decode(ujson.encode(s))).index
Out[26]: Index([u'10', u'15', u'6', u'7', u'8', u'9'], dtype='object')

jorisvandenbossche · 2014-04-12T09:14:27Z

@jreback Just wondering, but it would also be an option to only let Series.eq (and the other methods) do this flexible comparison with alignment, and let the == non-flexible.

Because with this change you have df == df being non-flexible (not aligning) and demanding identical indices, while s == s is flexible/does align. Which is also a confusing inconsistency? Or is there a good reason for that?
But it is also confusing that s + s does align and s == s does not (but the same holds for a dataframe, and I would rather keep consistency within one operator).

cpcloud · 2014-04-12T12:58:52Z

+1 on @jorisvandenbossche's suggestion: named methods flexible, corresponding syntax is not. I personally find the unaligned error a useful sanity check.

jreback · 2014-04-28T00:45:17Z

going to bump; can work on in next version

jreback added API Design labels Apr 10, 2014

jreback added this to the 0.14.0 milestone Apr 10, 2014

API: allow Series comparison ops to align before comparison (GH1134)

216b8e5

TST: clean test_usjon/testSeries tests for Series non-alignable compa…

492b6cf

…risons

jreback modified the milestones: 0.15.0, 0.14.0 Apr 28, 2014

jreback mentioned this pull request May 29, 2014

pandas.Series.__eq__ is broken for series with different index #1134

Closed

jreback closed this Aug 5, 2014

jorisvandenbossche mentioned this pull request Sep 12, 2014

compare two series objects ignores index #8257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: allow Series comparison ops to align before comparison (GH1134) #6860

API: allow Series comparison ops to align before comparison (GH1134) #6860

jreback commented Apr 10, 2014

jreback commented Apr 10, 2014

jreback commented Apr 12, 2014

Komnomnomnom commented Apr 12, 2014

jorisvandenbossche commented Apr 12, 2014

cpcloud commented Apr 12, 2014

jreback commented Apr 28, 2014

API: allow Series comparison ops to align before comparison (GH1134) #6860

API: allow Series comparison ops to align before comparison (GH1134) #6860

Conversation

jreback commented Apr 10, 2014

jreback commented Apr 10, 2014

jreback commented Apr 12, 2014

Komnomnomnom commented Apr 12, 2014

jorisvandenbossche commented Apr 12, 2014

cpcloud commented Apr 12, 2014

jreback commented Apr 28, 2014