Skip to content

Rename Dataset.vars -> data_vars and remove deprecated aliases #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 19, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ xray: N-D labeled arrays and datasets in Python

.. image:: https://travis-ci.org/xray/xray.svg?branch=master
:target: https://travis-ci.org/xray/xray
.. image:: http://img.shields.io/pypi/v/xray.svg?style=flat
.. image:: https://badge.fury.io/py/xray.svg
:target: https://pypi.python.org/pypi/xray/

**xray** is an open source project and Python package that aims to bring the
Expand Down Expand Up @@ -108,4 +108,4 @@ See the License for the specific language governing permissions and
limitations under the License.

xray includes portions of pandas. The license for pandas is included in the
LICENSES directory.
licenses directory.
2 changes: 1 addition & 1 deletion doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Attributes
:toctree: generated/

Dataset.dims
Dataset.vars
Dataset.data_vars
Dataset.coords
Dataset.attrs

Expand Down
65 changes: 39 additions & 26 deletions doc/data-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -192,44 +192,47 @@ from the `netCDF`__ file format.
__ http://www.unidata.ucar.edu/software/netcdf/

In addition to the dict-like interface of the dataset itself, which can be used
to access any array in a dataset, datasets have four key properties:
to access any variable in a dataset, datasets have four key properties:

- ``dims``: a dictionary mapping from dimension names to the fixed length of
each dimension (e.g., ``{'x': 6, 'y': 6, 'time': 8}``)
- ``vars``: a dict-like container of arrays (`variables`)
- ``coords``: another dict-like container of arrays intended to label points
used in ``vars`` (e.g., 1-dimensional arrays of numbers, datetime objects or
strings)
- ``data_vars``: a dict-like container of DataArrays corresponding to variables
- ``coords``: another dict-like container of DataArrays intended to label points
used in ``data_vars`` (e.g., 1-dimensional arrays of numbers, datetime
objects or strings)
- ``attrs``: an ``OrderedDict`` to hold arbitrary metadata

The distinction between whether an array falls in variables or coordinates is
**mostly semantic**: coordinates are intended for constant/fixed/independent
quantities, unlike the varying/measured/dependent quantities that belong in
variables. Dictionary like access on a dataset will supply arrays found in
either category. However, the distinction does have important implications for
indexing and computation.
The distinction between whether a variables falls in data or coordinates
(borrowed from `CF conventions`_) is mostly semantic, and you can probably get
away with ignoring it if you like: dictionary like access on a dataset will
supply variables found in either category. However, xray does make use of the
distinction for indexing and computations. Coordinates indicate
constant/fixed/independent quantities, unlike the varying/measured/dependent
quantities that belong in data.

.. _CF conventions: http://cfconventions.org/

Here is an example of how we might structure a dataset for a weather forecast:

.. image:: _static/dataset-diagram.png

In this example, it would be natural to call ``temperature`` and
``precipitation`` "variables" and all the other arrays "coordinates" because
they label the points along the dimensions. (see [1]_ for more background on
this example).
``precipitation`` "data variables" and all the other arrays "coordinate
variables" because they label the points along the dimensions. (see [1]_ for
more background on this example).

.. _dataarray constructor:

Creating a Dataset
~~~~~~~~~~~~~~~~~~

To make an :py:class:`~xray.Dataset` from scratch, supply dictionaries for any
variables coordinates and attributes you would like to insert into the
variables, coordinates and attributes you would like to insert into the
dataset.

For the ``vars`` and ``coords`` arguments, keys should be the name of the
variable or coordinate, and values should be scalars, 1d arrays or tuples of
the form ``(dims, data[, attrs])`` sufficient to label each array:
variable and values should be scalars, 1d arrays or tuples of the form
``(dims, data[, attrs])`` sufficient to label each array:

- ``dims`` should be a sequence of strings.
- ``data`` should be a numpy.ndarray (or array-like object) that has a
Expand Down Expand Up @@ -292,15 +295,15 @@ values given by :py:class:`xray.DataArray` objects:

ds['temperature']

The valid keys include each listed coordinate and variable.
The valid keys include each listed coordinate and data variable.

Variables and coordinates are also contained separately in the
:py:attr:`~xray.Dataset.vars` and :py:attr:`~xray.Dataset.coords`
Data and coordinate variables are also contained separately in the
:py:attr:`~xray.Dataset.data_vars` and :py:attr:`~xray.Dataset.coords`
dictionary-like attributes:

.. ipython:: python

ds.vars
ds.data_vars
ds.coords

Finally, like data arrays, datasets also store arbitrary metadata in the form
Expand All @@ -317,6 +320,16 @@ xray does not enforce any restrictions on attributes, but serialization to
some file formats may fail if you use objects that are not strings, numbers
or :py:class:`numpy.ndarray` objects.

As a useful shortcut, you can use attribute style access for reading (but not
setting) variables and attributes:

.. ipython:: python

ds.temperature

This is particularly useful in an exploratory context, because you can
tab-complete these variable names with tools like IPython.

Dictionary like methods
~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -381,7 +394,7 @@ Another useful option is the ability to rename the variables in a dataset:
Coordinates
-----------

Coordinates are ancillary arrays stored for ``DataArray`` and ``Dataset``
Coordinates are ancillary variables stored for ``DataArray`` and ``Dataset``
objects in the ``coords`` attribute:

.. ipython:: python
Expand Down Expand Up @@ -421,12 +434,12 @@ dimension and whose the values are ``Index`` objects:

ds.indexes

Switching between coordinates and variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Switching between data and coordinate variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To entirely add or removing coordinate arrays, you can use dictionary like
syntax, as shown above. To convert back and forth between coordinates and
variables, use the the :py:meth:`~xray.Dataset.set_coords` and
syntax, as shown above. To convert back and forth between data and
coordinates, use the the :py:meth:`~xray.Dataset.set_coords` and
:py:meth:`~xray.Dataset.reset_coords` methods:

.. ipython:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/examples/weather-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Examine a dataset with pandas_ and seaborn_

@savefig examples_pairplot.png
sns.pairplot(ds[['tmin', 'tmax', 'time.month']].to_dataframe(),
vars=ds.vars, hue='time.month')
vars=ds.data_vars, hue='time.month')


Probability of freeze by calendar month
Expand Down
18 changes: 16 additions & 2 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,23 @@ Highlights
~~~~~~~~~~

- Automatic alignment of index labels in arithmetic, dataset cosntruction and
merging.
- Aggregation operations skip missing values by default.
merging. TODO: finish documenting.
- Aggregation operations now skip missing values by default:

.. ipython:: python

DataArray([1, 2, np.nan, 3]).mean()

You can turn this behavior off by supplying the keyword arugment
``skip_na=False``.
- You will need to update your code if you have been ignoring deprecation
warnings: methods and attributes that were deprecated in xray v0.3 or earlier
have gone away.
- Lots of bug fixes.

Enhancements
~~~~~~~~~~~~

- Support for reindexing with a fill method. This will especially useful with
pandas 0.16, which will support a fill method of ``'nearest'``.

Expand Down
4 changes: 2 additions & 2 deletions xray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -742,7 +742,7 @@ def decode_cf(obj, concat_characters=True, mask_and_scale=True,
from .backends.common import AbstractDataStore

if isinstance(obj, Dataset):
vars = obj._arrays
vars = obj._variables
attrs = obj.attrs
extra_coords = set(obj.coords)
file_obj = obj._file_obj
Expand Down Expand Up @@ -855,7 +855,7 @@ def encode_dataset_coordinates(dataset):
attrs : dict
"""
non_dim_coord_names = set(dataset.coords) - set(dataset.dims)
return _encode_coordinates(dataset._arrays, dataset.attrs,
return _encode_coordinates(dataset._variables, dataset.attrs,
non_dim_coord_names=non_dim_coord_names)


Expand Down
1 change: 0 additions & 1 deletion xray/core/alignment.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import functools
import operator
import warnings
from collections import defaultdict

import numpy as np
Expand Down
8 changes: 4 additions & 4 deletions xray/core/coordinates.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def __getitem__(self, key):

def __iter__(self):
# needs to be in the same order as the dataset variables
for k in self._dataset._arrays:
for k in self._dataset._variables:
if k in self._names:
yield k

Expand Down Expand Up @@ -84,7 +84,7 @@ def to_index(self, ordered_dims=None):
"""
if ordered_dims is None:
ordered_dims = self.dims
indexes = [self._dataset._arrays[k].to_index() for k in ordered_dims]
indexes = [self._dataset._variables[k].to_index() for k in ordered_dims]
return pd.MultiIndex.from_product(indexes, names=list(ordered_dims))

def _merge_validate(self, other):
Expand All @@ -96,7 +96,7 @@ def _merge_validate(self, other):
promote_dims = {}
for k in self:
if k in other:
self_var = self._dataset._arrays[k]
self_var = self._dataset._variables[k]
other_var = other[k].variable
if not self_var.broadcast_equals(other_var):
if k in self.dims and k in other.dims:
Expand Down Expand Up @@ -182,7 +182,7 @@ def __init__(self, dataarray):
def __setitem__(self, key, value):
with self._dataarray._set_new_dataset() as ds:
ds.coords[key] = value
bad_dims = [d for d in ds._arrays[key].dims
bad_dims = [d for d in ds._variables[key].dims
if d not in self.dims]
if bad_dims:
raise ValueError('DataArray does not include all coordinate '
Expand Down
Loading