Skip to content

Standardize pandas metadata for table schema and parquet #19261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TomAugspurger opened this issue Jan 16, 2018 · 1 comment
Open

Standardize pandas metadata for table schema and parquet #19261

TomAugspurger opened this issue Jan 16, 2018 · 1 comment
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement IO JSON read_json, to_json, json_normalize IO Parquet parquet, feather

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 16, 2018

Both to_json(orient='table') and to_parquet write some metadata about the dataframe. It'd be nice to standardize these.

Discrepancies

  • Default index names ('index' vs. '__index_level_d__')

We should also add a function for parsing the metadata, and perhaps constructing an empty DataFrame with the correct index, types, and column names based off the metadata.

@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jan 16, 2018
@TomAugspurger TomAugspurger added IO Data IO issues that don't fit into a more specific label IO JSON read_json, to_json, json_normalize Difficulty Intermediate IO Parquet parquet, feather labels Jan 16, 2018
@WillAyd
Copy link
Member

WillAyd commented Jan 23, 2018

So are you thinking of porting some of the code from arrow over into pandas to accomplish this? Haven't used parquet before but I see that the below point in the pandas_compat module is where the index names you mentioned are appearing.

https://github.com/apache/arrow/blob/422efd9635ea6f249adec7e1fda4834f6ac46cc4/python/pyarrow/pandas_compat.py#L182

@jbrockmendel jbrockmendel removed the IO Data IO issues that don't fit into a more specific label label Dec 1, 2019
@mroeschke mroeschke added API - Consistency Internal Consistency of API/Behavior Enhancement labels Apr 19, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement IO JSON read_json, to_json, json_normalize IO Parquet parquet, feather
Projects
None yet
Development

No branches or pull requests

4 participants