Skip to content

BUG: Fix .to_excel() for MultiIndex containing a NaN value #13511 #13551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -767,3 +767,5 @@ Bug Fixes
- Bug where ``pd.read_gbq()`` could throw ``ImportError: No module named discovery`` as a result of a naming conflict with another python package called apiclient (:issue:`13454`)
- Bug in ``Index.union`` returns an incorrect result with a named empty index (:issue:`13432`)
- Bugs in ``Index.difference`` and ``DataFrame.join`` raise in Python3 when using mixed-integer indexes (:issue:`13432`, :issue:`12814`)

- Bug in ``.to_excel()`` when DataFrame contains a MultiIndex which contains a label with a NaN value (:issue:`13511`)
6 changes: 5 additions & 1 deletion pandas/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -1839,7 +1839,11 @@ def _format_hierarchical_rows(self):
for spans, levels, labels in zip(level_lengths,
self.df.index.levels,
self.df.index.labels):
values = levels.take(labels)

values = levels.take(labels,
allow_fill=levels._can_hold_na,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need the allow_fill argument.

Copy link
Member

@jorisvandenbossche jorisvandenbossche Jul 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback that value is by default True, and once you pass a fill_value, you get an error for integer levels. So it is either passing allow_fill like above, or eiter passing fill_value conditionally like it was initially in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't change the test result with my change
this their is not enough testing here to catch things

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I leave out the allow_fill=levels._can_hold_na, I get:

In [1]: df = DataFrame({'x': np.random.uniform(0, 1, 3), 'y': np.random.uniform(0, 1, 3)})

In [2]: df['A'] = [1,1,3]

In [3]: df['B'] = ['a', np.nan, 'b']

In [4]: df = df.set_index(['A', 'B'])

In [5]: df
Out[5]:
              x         y
A B
1 a    0.335609  0.521433
  NaN  0.531680  0.265201
3 b    0.910320  0.520158

In [6]: df.to_excel('test.xlsx')
---------------------------------------------------------------------------
ValueError: Unable to fill values because Int64Index cannot contain NA

But the problem is that the test frame only contains floats. @mpuels can you update the test to also include an integer index case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I'm sorry, but I can't follow you. I guess when you say

But the problem is that the test frame only contains floats. @mpuels can you update the test to also include an integer index case?

then you mean the MultiIndex of the DataFrame in my test case only contains floats and that I shall construct a test case where each level of the MultiIndex is of type integer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed, that is what I meant!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(because the reason you have to specify allow_fill is otherwise not tested)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche So you claim that if I substitute

values = levels.take(labels,
                     allow_fill=levels._can_hold_na,
                     fill_value=True)

with

values = levels.take(labels,
                     fill_value=True)

that no test will detect the change, i.e. that all tests pass? I ran the tests by entering

$ nosetests pandas/io/tests/test_excel.py:Openpyxl20Tests

against the PR which contains allow_fill=levels._can_hold_na and then applied the aforementioned substitution and ran the tests again. In the first case the output is

......SS...........................
----------------------------------------------------------------------
Ran 35 tests in 2.322s

OK (SKIP=2)

and in the second case it is

......SS...............EEE.........
======================================================================
ERROR: test_to_excel_multiindex (pandas.io.tests.test_excel.Openpyxl20Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mpuels/progs/pandas-mpuels/pandas/io/tests/test_excel.py", line 1776, in wrapped
    orig_method(self, *args, **kwargs)
  File "/home/mpuels/progs/pandas-mpuels/pandas/io/tests/test_excel.py", line 1321, in test_to_excel_multiindex
    frame.to_excel(path, 'test1', header=False)
  File "/home/mpuels/progs/pandas-mpuels/pandas/core/frame.py", line 1431, in to_excel
    startrow=startrow, startcol=startcol)
  File "/home/mpuels/progs/pandas-mpuels/pandas/io/excel.py", line 875, in write_cells
    for cell in cells:
  File "/home/mpuels/progs/pandas-mpuels/pandas/formats/format.py", line 1986, in get_formatted_cells
    self._format_body()):
  File "/home/mpuels/progs/pandas-mpuels/pandas/formats/format.py", line 1957, in _format_hierarchical_rows
    fill_value=True)
  File "/home/mpuels/progs/pandas-mpuels/pandas/indexes/base.py", line 1438, in take
    raise ValueError(msg.format(self.__class__.__name__))
ValueError: Unable to fill values because Int64Index cannot contain NA

======================================================================
ERROR: test_to_excel_multiindex_cols (pandas.io.tests.test_excel.Openpyxl20Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mpuels/progs/pandas-mpuels/pandas/io/tests/test_excel.py", line 1776, in wrapped
    orig_method(self, *args, **kwargs)
  File "/home/mpuels/progs/pandas-mpuels/pandas/io/tests/test_excel.py", line 1366, in test_to_excel_multiindex_cols
    frame.to_excel(path, 'test1', merge_cells=self.merge_cells)
  File "/home/mpuels/progs/pandas-mpuels/pandas/core/frame.py", line 1431, in to_excel
    startrow=startrow, startcol=startcol)
  File "/home/mpuels/progs/pandas-mpuels/pandas/io/excel.py", line 875, in write_cells
    for cell in cells:
  File "/home/mpuels/progs/pandas-mpuels/pandas/formats/format.py", line 1986, in get_formatted_cells
    self._format_body()):
  File "/home/mpuels/progs/pandas-mpuels/pandas/formats/format.py", line 1957, in _format_hierarchical_rows
    fill_value=True)
  File "/home/mpuels/progs/pandas-mpuels/pandas/indexes/base.py", line 1438, in take
    raise ValueError(msg.format(self.__class__.__name__))
ValueError: Unable to fill values because Int64Index cannot contain NA

======================================================================
ERROR: test_to_excel_multiindex_dates (pandas.io.tests.test_excel.Openpyxl20Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mpuels/progs/pandas-mpuels/pandas/io/tests/test_excel.py", line 1776, in wrapped
    orig_method(self, *args, **kwargs)
  File "/home/mpuels/progs/pandas-mpuels/pandas/io/tests/test_excel.py", line 1387, in test_to_excel_multiindex_dates
    tsframe.to_excel(path, 'test1', merge_cells=self.merge_cells)
  File "/home/mpuels/progs/pandas-mpuels/pandas/core/frame.py", line 1431, in to_excel
    startrow=startrow, startcol=startcol)
  File "/home/mpuels/progs/pandas-mpuels/pandas/io/excel.py", line 875, in write_cells
    for cell in cells:
  File "/home/mpuels/progs/pandas-mpuels/pandas/formats/format.py", line 1986, in get_formatted_cells
    self._format_body()):
  File "/home/mpuels/progs/pandas-mpuels/pandas/formats/format.py", line 1957, in _format_hierarchical_rows
    fill_value=True)
  File "/home/mpuels/progs/pandas-mpuels/pandas/indexes/base.py", line 1438, in take
    raise ValueError(msg.format(self.__class__.__name__))
ValueError: Unable to fill values because Int64Index cannot contain NA

----------------------------------------------------------------------
Ran 35 tests in 2.163s

FAILED (SKIP=2, errors=3)

So the test which I constructed for this PR did not fail, but three other tests which already exist. Shall I nonetheless construct another test case where the Index contains integers? Or did I miss anything? Thanks for your patience!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't claim that very strongly :-)

I only ran the specific test here in this PR, and there it is not catched, but as you show it is certainly catched in the other tests, so OK then!

fill_value=True)

for i in spans:
if spans[i] > 1:
yield ExcelCell(self.rowcounter + i, gcolidx,
Expand Down
14 changes: 14 additions & 0 deletions pandas/io/tests/test_excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -1328,6 +1328,20 @@ def test_to_excel_multiindex(self):
parse_dates=False)
tm.assert_frame_equal(frame, df)

# GH13511
def test_to_excel_multiindex_nan_label(self):
_skip_if_no_xlrd()

frame = pd.DataFrame({'A': [None, 2, 3],
'B': [10, 20, 30],
'C': np.random.sample(3)})
frame = frame.set_index(['A', 'B'])

with ensure_clean(self.ext) as path:
frame.to_excel(path, merge_cells=self.merge_cells)
df = read_excel(path, index_col=[0, 1])
tm.assert_frame_equal(frame, df)

# Test for Issue 11328. If column indices are integers, make
# sure they are handled correctly for either setting of
# merge_cells
Expand Down