Skip to content

Error setting multiple columns via hierarchical indexing in DataFrame #2295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bluefir opened this issue Nov 19, 2012 · 4 comments
Closed

Error setting multiple columns via hierarchical indexing in DataFrame #2295

bluefir opened this issue Nov 19, 2012 · 4 comments
Milestone

Comments

@bluefir
Copy link

bluefir commented Nov 19, 2012

I encountered the following:

for i in xrange(len(reg_results)):
    name = names[i]
    reg_result = reg_results[i]
    betas[name] = reg_result._beta_raw

AssertionError Traceback (most recent call last)
in ()
5 print(betas[name].shape)
6 print(reg_result._beta_raw.shape)
----> 7 betas[name] = reg_result._beta_raw

C:\Python27\lib\site-packages\pandas\core\frame.pyc in setitem(self, key, value)
1802 else:
1803 # set column
-> 1804 self._set_item(key, value)
1805
1806 def _boolean_set(self, key, value):

C:\Python27\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value)
1842 """
1843 value = self._sanitize_column(key, value)
-> 1844 NDFrame._set_item(self, key, value)
1845
1846 def insert(self, loc, column, value):

C:\Python27\lib\site-packages\pandas\core\generic.pyc in _set_item(self, key, value)
491 if len(key) != self.columns.nlevels:
492 key += ('',)*(self.columns.nlevels - len(key))
--> 493 self._data.set(key, value)
494
495 try:

C:\Python27\lib\site-packages\pandas\core\internals.pyc in set(self, item, value)
882 if value.ndim == self.ndim - 1:
883 value = value.reshape((1,) + value.shape)
--> 884 assert(value.shape[1:] == self.shape[1:])
885 if item in self.items:
886 i, block = self._find_block(item)

AssertionError:

Data shapes seem to be equal:

print(betas[name].shape)
print(reg_result._beta_raw.shape)
betas[name].shape == reg_result._beta_raw.shape

(1257, 3)
(1257L, 3L)
True

The walkaround that works is the following:

for i in xrange(len(reg_results)):
    name = names[i]
    reg_result = reg_results[i]
    for i in xrange(len(data_columns)):
        column = data_columns[i]
        betas[name, column] = reg_result._beta_raw[:, i]

Is this the way to go or did I encounter a bug?

@wesm
Copy link
Member

wesm commented Nov 20, 2012

It might be a bug; can you post a standalone reproduction of the issue?

@bluefir
Copy link
Author

bluefir commented Nov 21, 2012

import numpy as np
import pandas as pd
from pandas import Index, MultiIndex, Series, DataFrame

n_dates = 1000
n_securities = 2000
n_factors = 3
n_versions = 3

dates = pd.date_range('1997-12-31', periods=n_dates, freq='B')
dates = Index(map(lambda x: x.year * 10000 + x.month * 100 + x.day, dates))

secid_min = int('10000000', 16)
secid_max = int('F0000000', 16)
step = (secid_max - secid_min) // (n_securities - 1)
security_ids = map(lambda x: hex(x)[2:10].upper(), range(secid_min, secid_max + 1, step))

data_index = MultiIndex(levels=[dates.values, security_ids],
    labels=[[i for i in xrange(n_dates) for _ in xrange(n_securities)], range(n_securities) * n_dates],
    names=['date', 'security_id'])
n_data = len(data_index)

factors = Index(['factor{}'.format(i) for i in xrange(1, n_factors + 1)])
versions = ['version{}'.format(i) for i in xrange(1, n_versions + 1)]
beta_columns = MultiIndex(levels=[versions, factors.values],
    labels=[[i for i in xrange(n_versions) for _ in xrange(n_factors)], range(n_factors) * n_versions])
betas = DataFrame(index=dates, columns=beta_columns)

for version in versions:
    y = Series(np.random.randn(n_data), index=data_index)
    x = DataFrame(np.random.randn(n_data, n_factors), index=data_index, columns=factors)
    reg_result = pd.fama_macbeth(y=y, x=x, intercept=False)
    betas[version] = reg_result._beta_raw

Traceback (most recent call last):
File "C:\Python27\lib\site-packages\IPython\core\interactiveshell.py", line 2431, in safe_execfile
py3compat.execfile(fname,*where)
File "C:\Python27\lib\site-packages\IPython\utils\py3compat.py", line 171, in execfile
exec compile(scripttext, filename, 'exec') in glob, loc
File "D:\BlueFir\develop\python\AlphaModel\TestShapes.py", line 35, in
betas[version] = reg_result._beta_raw
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1804, in setitem
self._set_item(key, value)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1844, in _set_item
NDFrame._set_item(self, key, value)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 493, in _set_item
self._data.set(key, value)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 884, in set
assert(value.shape[1:] == self.shape[1:])
AssertionError

@wesm
Copy link
Member

wesm commented Nov 21, 2012

Assigning multiple columns like that doesn't work yet unfortunately. Will see what I can do

@bluefir
Copy link
Author

bluefir commented Nov 21, 2012

As long as it works for one column, it's not a big deal since you can always iterate through columns. The AssertionError message was confusing though. Thanks for the explanation.

@wesm wesm closed this as completed in 5d6e7c8 Nov 24, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants