Skip to content

Commit e9ff779

Browse files
committed
ENH: Support MultiIndex columns in parquet pandas-dev#34777
1. Update check to handle MultiIndex columns for parquet format 2. Edit whatsnew entry. 3. Add test for writing MultiIndex columns with string column names
1 parent c974259 commit e9ff779

File tree

3 files changed

+17
-4
lines changed

3 files changed

+17
-4
lines changed

doc/source/whatsnew/v1.2.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,7 @@ I/O
297297
- :meth:`to_csv` did not support zip compression for binary file object not having a filename (:issue: `35058`)
298298
- :meth:`to_csv` and :meth:`read_csv` did not honor `compression` and `encoding` for path-like objects that are internally converted to file-like objects (:issue:`35677`, :issue:`26124`, and :issue:`32392`)
299299
- :meth:`to_picke` and :meth:`read_pickle` did not support compression for file-objects (:issue:`26237`, :issue:`29054`, and :issue:`29570`)
300-
- :meth:`to_parquet` did not support MultiIndex for columns in parquet format (:issue:`34777`)
300+
- :meth:`to_parquet` did not support :class:`MultiIndex` for columns in parquet format (:issue:`34777`)
301301

302302
Plotting
303303
^^^^^^^^

pandas/io/parquet.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
from pandas.compat._optional import import_optional_dependency
88
from pandas.errors import AbstractMethodError
99

10-
from pandas import DataFrame, get_option
10+
from pandas import DataFrame, MultiIndex, get_option
1111

1212
from pandas.io.common import get_filepath_or_buffer, is_fsspec_url, stringify_path
1313

@@ -54,7 +54,7 @@ def validate_dataframe(df: DataFrame):
5454
raise ValueError("to_parquet only supports IO with DataFrames")
5555

5656
# must have value column names for all index levels (strings only)
57-
if df.columns.nlevels > 1:
57+
if isinstance(df.columns, MultiIndex):
5858
if not all(
5959
x.inferred_type in {"string", "empty"} for x in df.columns.levels
6060
):

pandas/tests/io/test_parquet.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -410,11 +410,24 @@ def test_write_multiindex(self, pa):
410410
check_round_trip(df, engine)
411411

412412
def test_write_column_multiindex(self, engine):
413-
# column multi-index
413+
# Not able to write column multi-indexes with non-string column names.
414414
mi_columns = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1)])
415415
df = pd.DataFrame(np.random.randn(4, 3), columns=mi_columns)
416416
self.check_error_on_write(df, engine, ValueError)
417417

418+
def test_write_column_multiindex_string(self, pa):
419+
# Not supported in fastparquet as of 0.1.3 or older pyarrow version
420+
engine = pa
421+
422+
# Write column multi-indexes with string column names
423+
arrays = [
424+
["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
425+
["one", "two", "one", "two", "one", "two", "one", "two"],
426+
]
427+
df = pd.DataFrame(np.random.randn(8, 8), columns=arrays)
428+
429+
check_round_trip(df, engine)
430+
418431
def test_multiindex_with_columns(self, pa):
419432
engine = pa
420433
dates = pd.date_range("01-Jan-2018", "01-Dec-2018", freq="MS")

0 commit comments

Comments
 (0)