Skip to content

np.nan rows being dropped from HDFStore frame_tables using append #3318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gravesee opened this issue Apr 11, 2013 · 4 comments
Closed

np.nan rows being dropped from HDFStore frame_tables using append #3318

gravesee opened this issue Apr 11, 2013 · 4 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@gravesee
Copy link

When storing a dataframe using append, np.nan rows are dropped from the stored frame_tables:

test = pd.Series(range(401125), dtype=np.float64)
idx = np.random.permutation(test.index)[:50000]
test[idx] = np.nan

# check that there are indeed np.nan in the series
test.isnull().value_counts()
False    351125
True      50000

# create a store
tmp = pd.HDFStore('tmp.h5', 'w')

# add as a DataFrame using put
tmp.put('df_put', pd.DataFrame(test))

tmp.df_put.squeeze().isnull().value_counts()
False    351125
True      50000

# add as a DataFrame using append
tmp.append('df_append', pd.DataFrame(test))

tmp.df_append.squeeze().isnull().value_counts()
False    351125
@jreback
Copy link
Contributor

jreback commented Apr 11, 2013

see notes & caveats http://pandas.pydata.org/pandas-docs/dev/io.html#notes-caveats, 2nd one down

hard to fix and usually not not a bug deal

@gravesee
Copy link
Author

Gah, sorry man. I even referenced the docs before posting this issue. I'm really trying to get a viable workflow going and I'm very close. As for how to handle the np.nans being dropped, what do you think about storing the index in a StoreManager class. When an object is requested form the store, it is reindexed using the original, complete index. Something like this StoreManager class is what I'm working on. Thanks again for your help and sorry for the noob questions.

@jreback
Copy link
Contributor

jreback commented Apr 11, 2013

For your workflow that is probably approriate, n.b. I only drop all-nan rows, which is normally not a problem in a frame (as every field has to be a nan in a particular row). I originally did this for panels. Its not hard to fix, but prob won't get to it till 0.12. Maybe can incorporate your StoreManager for that purpose

@jreback
Copy link
Contributor

jreback commented Sep 20, 2013

closed by #4625

@jreback jreback closed this as completed Sep 20, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

2 participants