You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It may seem as if SparseDataFrame does not follow the sparse semantics that are followed by DataFrame's from_dict method. It stuffs each dict as a single column/element, rather than treating its key-val tuples as sparse content for building the equivalent sparse row as DataFrame does.
per my availability when this is ever responded to, happy to contribute a fix/feature for this, if given fair guidance as to where to touch and what to look out for 😃
Also, the following (a workaround) seems to work slower than one could hope for, given the input is already sparse when it arrives at the SparseDataFrame.... but might boil down to my specific scipy usage strategy below.
from scipy.sparse import dok_matrix, coo_matrix
from pandas import SparseDataFrame
%time i, j, data = zip(*((i, t[0], t[1]) for i, row in enumerate(a) for t in row))
%time m = coo_matrix((data, (i, j)), shape=(num_of_docs, lexicon_size))
%time sdf = SparseDataFrame(m).fillna(0)
CPU times: user 80 ms, sys: 4 ms, total: 84 ms
Wall time: 84.5 ms
CPU times: user 28 ms, sys: 4 ms, total: 32 ms
Wall time: 28.5 ms
CPU times: user 41.9 s, sys: 132 ms, total: 42 s
Wall time: 41.8 s
Code Sample
Problem description
It may seem as if
SparseDataFrame
does not follow the sparse semantics that are followed byDataFrame
'sfrom_dict
method. It stuffs each dict as a single column/element, rather than treating its key-val tuples as sparse content for building the equivalent sparse row asDataFrame
does.Expected Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: