-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
sparse=True option for from_dataframe and from_series #3210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes pydata#3206 Example usage: In [3]: import pandas as pd ...: import numpy as np ...: import xarray ...: df = pd.DataFrame({ ...: 'w': range(10), ...: 'x': list('abcdefghij'), ...: 'y': np.arange(0, 100, 10), ...: 'z': np.ones(10), ...: }).set_index(['w', 'x', 'y']) ...: In [4]: ds = xarray.Dataset.from_dataframe(df, sparse=True) In [5]: ds.z.data Out[5]: <COO: shape=(10, 10, 10), dtype=float64, nnz=10, fill_value=nan>
Amazing! This is going to allow way more usage of xarray for sparse representations of data currently confined to pandas |
It would be great if someone could review this. Maybe @crusaderky ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the review!
@crusaderky any other comments? |
temp_name = "__temporary_name" | ||
df = pd.DataFrame({temp_name: series}) | ||
ds = Dataset.from_dataframe(df, sparse=sparse) | ||
result = cast(DataArray, ds[temp_name]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can get rid of the cast by changing Dataset.__getitem__
. Let's merge this one first though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or not. Unlike @functools.single_dispatch
, @typing.overload
does not allow for a more specific signature followed by a more generic one :(
@overload
def __getitem__(self, key: Mapping) -> "Dataset":
...
@overload
def __getitem__(self, key: Hashable) -> "DataArray":
...
@overload
def __getitem__(self, key: Any) -> "Dataset":
...
def __getitem__(self, key):
mypy complains:
xarray/core/dataset.py:1218: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
xarray/core/dataset.py:1222: error: Overloaded function signatures 2 and 3 overlap with incompatible return types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I tried something very similar. This is why I wrote the "TODO" note above mentioning python/mypy#7328.
Looks good to me |
Great! |
Fixes #3206
Example usage:
black . && mypy . && flake8
whats-new.rst
for all changes andapi.rst
for new API