Skip to content

Empty subtypes of Index return their type, rather than Index #10599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

max-sixty
Copy link
Contributor

Resolves #10596

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type Compat pandas objects compatability with Numpy or Python functions labels Jul 16, 2015
@jreback jreback added this to the 0.17.0 milestone Jul 16, 2015
@@ -938,6 +938,10 @@ def test_difference(self):
self.assertEqual(len(result), 0)
self.assertEqual(result.name, first.name)

# empty difference for subtypes
result = self.periodIndex.difference(self.periodIndex)
self.assertIsInstance(result, pd.PeriodIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tm.assert_index_equal(...) here (and directly construct the expected)

# GH 10596 - empty difference retains index's type

result = idx.difference(idx)
self.assertIsInstance(result, type(idx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the way to tthis generically is define a create_empty method analagous to create_index for each subclass which has is an empty version of create_index.

@max-sixty
Copy link
Contributor Author

@jreback I'm having some issues with comparing empty Indexes. For Timedelta, Datetime & Categorical, I get the equivalent of this:

AssertionError: TimedeltaIndex([], dtype='timedelta64[ns]', freq='D') != TimedeltaIndex([], dtype='timedelta64[ns]', freq='D')

I'd love to take a look at the wider problem in the future; in the meantime is there a way of getting this fix in? Is there a strong reason you don't like the type test rather than the direct comparison?

@jreback
Copy link
Contributor

jreback commented Jul 17, 2015

no, use idx1.equals(idx2)

@jreback
Copy link
Contributor

jreback commented Jul 17, 2015

just asserting type papers over the meta data propogation, so its just creates a new bug, rather than fixing the existing one.

@max-sixty max-sixty force-pushed the master branch 3 times, most recently from 767f716 to d1e478f Compare July 17, 2015 23:41
@max-sixty
Copy link
Contributor Author

Great @jreback, that should be good to go. Let me know if I've missed anything. Cheers

@jreback
Copy link
Contributor

jreback commented Jul 17, 2015

pls add a release note in whatsnew/v0.17.0

squash to a single commit

ping when green.

@max-sixty max-sixty force-pushed the master branch 3 times, most recently from d64c90b to beab262 Compare July 18, 2015 22:41
@max-sixty
Copy link
Contributor Author

I have a PyTables test fail - does anyone have any idea what this might be? I've dug around, but not getting anywhere fast; and have zero PyTables experience / installation.

ERROR: test_to_hdf_with_object_column_names (pandas.io.tests.test_pytables.TestHDFStore)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/io/tests/test_pytables.py", line 4678, in test_to_hdf_with_object_column_names
    df.to_hdf(path, 'df', format='table', data_columns=True)
  File "/home/travis/build/pydata/pandas/pandas/core/generic.py", line 920, in to_hdf
    return pytables.to_hdf(path_or_buf, key, self, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/pytables.py", line 269, in to_hdf
    f(store)
  File "/home/travis/build/pydata/pandas/pandas/io/pytables.py", line 264, in <lambda>
    f = lambda store: store.put(key, value, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/pytables.py", line 826, in put
    self._write_to_group(key, value, append=append, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/pytables.py", line 1275, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/pytables.py", line 3798, in write
    **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/pytables.py", line 3404, in create_axes
    axis=axis
  File "/home/travis/build/pydata/pandas/pandas/core/frame.py", line 2522, in reindex_axis
    fill_value=fill_value)
  File "/home/travis/build/pydata/pandas/pandas/core/generic.py", line 1852, in reindex_axis
    limit=limit)
  File "/home/travis/build/pydata/pandas/pandas/core/index.py", line 3062, in reindex
    if not is_categorical_dtype(target) and not target.is_unique:
  File "properties.pyx", line 34, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:39442)
  File "/home/travis/build/pydata/pandas/pandas/core/index.py", line 744, in is_unique
    return self._engine.is_unique
  File "index.pyx", line 213, in pandas.index.IndexEngine.is_unique.__get__ (pandas/index.c:4392)
  File "index.pyx", line 248, in pandas.index.IndexEngine._do_unique_check (pandas/index.c:4874)
  File "index.pyx", line 261, in pandas.index.IndexEngine._ensure_mapping_populated (pandas/index.c:5047)
  File "index.pyx", line 267, in pandas.index.IndexEngine.initialize (pandas/index.c:5135)
  File "hashtable.pyx", line 703, in pandas.hashtable.PyObjectHashTable.map_locations (pandas/hashtable.c:11242)
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'double'

@jreback
Copy link
Contributor

jreback commented Jul 19, 2015

what are the columns on the df
(this is the point of the test - you can only have strong columns not numbers and such)

@max-sixty
Copy link
Contributor Author

The columns are either strings or categoricals - that's the line this fails on.
If you cut through to what this issue is, that'd be awesome. Otherwise I can try and install PyTables and debug from the bottom.

        types_should_run = [ tm.makeStringIndex, tm.makeCategoricalIndex ]
...
        for index in types_should_run:
            df = DataFrame(np.random.randn(10, 2), columns=index(2))
            with ensure_clean_path(self.path) as path:
                df.to_hdf(path, 'df', format='table', data_columns=True)
                result = pd.read_hdf(path, 'df', where="index = [{0}]".format(df.index[0]))
                assert(len(result))

@@ -344,7 +344,7 @@ Bug Fixes
- Bug in ``ExcelReader`` when worksheet is empty (:issue:`6403`)
- Bug in ``Table.select_column`` where name is not preserved (:issue:`10392`)
- Bug in ``offsets.generate_range`` where ``start`` and ``end`` have finer precision than ``offset`` (:issue:`9907`)

- Bug in ``Subclasses of Index with no values returned Index objects rather than their own classes, in some cases`` (:issue:`10596`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the double-back ticks Index only here (otherwise you are quoting the entire string).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants