-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: reshape of categorical via unstack/to_panel #8704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When I drop into the TypeError: data type not understood
> /Users/drpugh/anaconda/lib/python2.7/site-packages/pandas/core/reshape.py(1208)block2d_to_blocknd()
1207 if mask.all():
-> 1208 pvalues = np.empty(panel_shape, dtype=values.dtype)
1209 else:
ipdb> values
[NaN, NaN, NaN, NaN, NaN, ..., Extrapolated, Extrapolated, Extrapolated, Extrapolated, Extrapolated]
Length: 10354
Categories (3, object): [Benchmark < Extrapolated < Interpolated]
ipdb> values.dtype
category
ipdb> |
you will need to show the construction of your frame (in code) and df.dtypes |
Categorical is a new type in 0.15 but you have to explicitly use it so puzzling why you have it in this expression (you didn't mention using it) it is prob a bug that to_panel() doesn't work with Categorical - but as I said you have to actually convert your data in the first place |
When creating my frame I am not explicitly making use of the categorical type. I have written a function that takes some Stata .dta files as inputs and creates a try:
pwt_raw_data = pd.read_stata('pwt' + str(version) + '.dta')
dep_rates_raw_data = pd.read_stata('depreciation_rates.dta')
except IOError:
_download_pwt_data(base_url, version)
pwt_raw_data = pd.read_stata('pwt' + str(version) + '.dta')
dep_rates_raw_data = pd.read_stata('depreciation_rates.dta')
# merge the data
pwt_merged_data = pd.merge(pwt_raw_data, dep_rates_raw_data, how='outer',
on=['countrycode', 'year'])
# create the hierarchical index
pwt_merged_data.year = pd.to_datetime(pwt_raw_data.year, format='%Y')
pwt_merged_data.set_index(['countrycode', 'year'], inplace=True)
# coerce into a panel
pwt_panel_data = pwt_merged_data.to_panel()
return pwt_panel_data The dataframe called
As you suspected, three of the variables have been cast as categorical. I wonder if this assignment is being done by the |
I am working on a fix for this kind of reshaping. Though support of mixed-type data in a Panel is very limited ATM. You are almost certainly better off keeping it in a multi-indexed frame. Is their a reason you need a Panel? at a work-around ATM, you can simply do: |
I suppose no data set needs to be a Panel. The data set being loaded is the Penn World Tablesl data set that is widely used to study the economic growth across countries over time. It seems to be a natural use case for a Panel object. |
ok, we'll try to fix this. Their is a 'technical' issue so not sure can get it in 0.15.1, but we'll see. My point was that the manipulation tools are currently much better for multi-indexed frames that for Panels. And if the data is not too dense, a mi frame is more compact in representation. But Panels do have there uses. Thanks for the report. |
pandas.DataFrame
I have a hierarchical
pandas.DataFrame
that looks as follows...I would like to turn
data
into apandas.Panel
object. Previously I would do this using theto_panel
method without issue. However, after upgrading to Pandas 0.15, this approach no longer works...Has there been an API change? I could not find anything in the release notes to suggest that the above would no longer work.
The text was updated successfully, but these errors were encountered: