Skip to content

Commit 747962d

Browse files
committed
Merge remote-tracking branch 'upstream/master' into indexes/dataarray
* upstream/master: Added fill_value for unstack (pydata#3541) Add DatasetGroupBy.quantile (pydata#3527) ensure rename does not change index type (pydata#3532) Leave empty slot when not using accessors interpolate_na: Add max_gap support. (pydata#3302) units & deprecation merge (pydata#3530) Fix set_index when an existing dimension becomes a level (pydata#3520) add Variable._replace (pydata#3528) Tests for module-level functions with units (pydata#3493) Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502) FUNDING.yml (pydata#3523) Allow appending datetime & boolean variables to zarr stores (pydata#3504) warn if dim is passed to rolling operations. (pydata#3513) Deprecate allow_lazy (pydata#3435) Recursive tokenization (pydata#3515)
2 parents aefa5e3 + 56c16e4 commit 747962d

23 files changed

+1687
-192
lines changed

.github/FUNDING.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
github: numfocus
2+
custom: http://numfocus.org/donate-to-xarray

doc/computation.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,9 @@ for filling missing values via 1D interpolation.
9595
Note that xarray slightly diverges from the pandas ``interpolate`` syntax by
9696
providing the ``use_coordinate`` keyword which facilitates a clear specification
9797
of which values to use as the index in the interpolation.
98+
xarray also provides the ``max_gap`` keyword argument to limit the interpolation to
99+
data gaps of length ``max_gap`` or smaller. See :py:meth:`~xarray.DataArray.interpolate_na`
100+
for more.
98101

99102
Aggregation
100103
===========

doc/conf.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -340,9 +340,10 @@
340340
# Example configuration for intersphinx: refer to the Python standard library.
341341
intersphinx_mapping = {
342342
"python": ("https://docs.python.org/3/", None),
343-
"pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
344-
"iris": ("http://scitools.org.uk/iris/docs/latest/", None),
345-
"numpy": ("https://docs.scipy.org/doc/numpy/", None),
346-
"numba": ("https://numba.pydata.org/numba-doc/latest/", None),
347-
"matplotlib": ("https://matplotlib.org/", None),
343+
"pandas": ("https://pandas.pydata.org/pandas-docs/stable", None),
344+
"iris": ("https://scitools.org.uk/iris/docs/latest", None),
345+
"numpy": ("https://docs.scipy.org/doc/numpy", None),
346+
"scipy": ("https://docs.scipy.org/doc/scipy/reference", None),
347+
"numba": ("https://numba.pydata.org/numba-doc/latest", None),
348+
"matplotlib": ("https://matplotlib.org", None),
348349
}

doc/whats-new.rst

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,13 @@ Breaking changes
3838

3939
New Features
4040
~~~~~~~~~~~~
41+
42+
- Added the ``fill_value`` option to :py:meth:`~xarray.DataArray.unstack` and
43+
:py:meth:`~xarray.Dataset.unstack` (:issue:`3518`).
44+
By `Keisuke Fujii <https://github.com/fujiisoup>`_.
45+
- Added the ``max_gap`` kwarg to :py:meth:`~xarray.DataArray.interpolate_na` and
46+
:py:meth:`~xarray.Dataset.interpolate_na`. This controls the maximum size of the data
47+
gap that will be filled by interpolation. By `Deepak Cherian <https://github.com/dcherian>`_.
4148
- :py:meth:`Dataset.drop_sel` & :py:meth:`DataArray.drop_sel` have been added for dropping labels.
4249
:py:meth:`Dataset.drop_vars` & :py:meth:`DataArray.drop_vars` have been added for
4350
dropping variables (including coordinates). The existing ``drop`` methods remain as a backward compatible
@@ -73,12 +80,22 @@ New Features
7380
for xarray objects. Note that xarray objects with a dask.array backend already used
7481
deterministic hashing in previous releases; this change implements it when whole
7582
xarray objects are embedded in a dask graph, e.g. when :meth:`DataArray.map` is
76-
invoked. (:issue:`3378`, :pull:`3446`)
83+
invoked. (:issue:`3378`, :pull:`3446`, :pull:`3515`)
7784
By `Deepak Cherian <https://github.com/dcherian>`_ and
7885
`Guido Imperiale <https://github.com/crusaderky>`_.
86+
- Add the documented-but-missing :py:meth:`xarray.core.groupby.DatasetGroupBy.quantile`.
87+
(:issue:`3525`, :pull:`3527`). By `Justus Magin <https://github.com/keewis>`_.
7988

8089
Bug fixes
8190
~~~~~~~~~
91+
- Ensure an index of type ``CFTimeIndex`` is not converted to a ``DatetimeIndex`` when
92+
calling :py:meth:`Dataset.rename` (also :py:meth:`Dataset.rename_dims`
93+
and :py:meth:`xr.Dataset.rename_vars`). By `Mathias Hauser <https://github.com/mathause>`_
94+
(:issue:`3522`).
95+
- Fix a bug in `set_index` in case that an existing dimension becomes a level variable of MultiIndex. (:pull:`3520`)
96+
By `Keisuke Fujii <https://github.com/fujiisoup>`_.
97+
- Harmonize `_FillValue`, `missing_value` during encoding and decoding steps. (:pull:`3502`)
98+
By `Anderson Banihirwe <https://github.com/andersy005>`_.
8299
- Fix regression introduced in v0.14.0 that would cause a crash if dask is installed
83100
but cloudpickle isn't (:issue:`3401`) by `Rhys Doyle <https://github.com/rdoyle45>`_
84101
- Fix grouping over variables with NaNs. (:issue:`2383`, :pull:`3406`).
@@ -88,9 +105,14 @@ Bug fixes
88105
By `Deepak Cherian <https://github.com/dcherian>`_.
89106
- Sync with cftime by removing `dayofwk=-1` for cftime>=1.0.4.
90107
By `Anderson Banihirwe <https://github.com/andersy005>`_.
108+
- Rolling reduction operations no longer compute dask arrays by default. (:issue:`3161`).
109+
In addition, the ``allow_lazy`` kwarg to ``reduce`` is deprecated.
110+
By `Deepak Cherian <https://github.com/dcherian>`_.
91111
- Fix :py:meth:`xarray.core.groupby.DataArrayGroupBy.reduce` and
92112
:py:meth:`xarray.core.groupby.DatasetGroupBy.reduce` when reducing over multiple dimensions.
93113
(:issue:`3402`). By `Deepak Cherian <https://github.com/dcherian/>`_
114+
- Allow appending datetime and bool data variables to zarr stores.
115+
(:issue:`3480`). By `Akihiro Matsukawa <https://github.com/amatsukawa/>`_.
94116

95117
Documentation
96118
~~~~~~~~~~~~~
@@ -111,7 +133,8 @@ Internal Changes
111133
~~~~~~~~~~~~~~~~
112134

113135
- Added integration tests against `pint <https://pint.readthedocs.io/>`_.
114-
(:pull:`3238`, :pull:`3447`, :pull:`3508`) by `Justus Magin <https://github.com/keewis>`_.
136+
(:pull:`3238`, :pull:`3447`, :pull:`3493`, :pull:`3508`)
137+
by `Justus Magin <https://github.com/keewis>`_.
115138

116139
.. note::
117140

@@ -130,6 +153,9 @@ Internal Changes
130153
- Enable type checking on default sentinel values (:pull:`3472`)
131154
By `Maximilian Roos <https://github.com/max-sixty>`_
132155

156+
- Add :py:meth:`Variable._replace` for simpler replacing of a subset of attributes (:pull:`3472`)
157+
By `Maximilian Roos <https://github.com/max-sixty>`_
158+
133159
.. _whats-new.0.14.0:
134160

135161
v0.14.0 (14 Oct 2019)
@@ -217,6 +243,9 @@ Bug fixes
217243
By `Deepak Cherian <https://github.com/dcherian>`_.
218244
- Fix error in concatenating unlabeled dimensions (:pull:`3362`).
219245
By `Deepak Cherian <https://github.com/dcherian/>`_.
246+
- Warn if the ``dim`` kwarg is passed to rolling operations. This is redundant since a dimension is
247+
specified when the :py:class:`DatasetRolling` or :py:class:`DataArrayRolling` object is created.
248+
(:pull:`3362`). By `Deepak Cherian <https://github.com/dcherian/>`_.
220249

221250
Documentation
222251
~~~~~~~~~~~~~

xarray/backends/api.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1234,15 +1234,18 @@ def _validate_datatypes_for_zarr_append(dataset):
12341234
def check_dtype(var):
12351235
if (
12361236
not np.issubdtype(var.dtype, np.number)
1237+
and not np.issubdtype(var.dtype, np.datetime64)
1238+
and not np.issubdtype(var.dtype, np.bool)
12371239
and not coding.strings.is_unicode_dtype(var.dtype)
12381240
and not var.dtype == object
12391241
):
12401242
# and not re.match('^bytes[1-9]+$', var.dtype.name)):
12411243
raise ValueError(
12421244
"Invalid dtype for data variable: {} "
12431245
"dtype must be a subtype of number, "
1244-
"a fixed sized string, a fixed size "
1245-
"unicode string or an object".format(var)
1246+
"datetime, bool, a fixed sized string, "
1247+
"a fixed size unicode string or an "
1248+
"object".format(var)
12461249
)
12471250

12481251
for k in dataset.data_vars.values():

xarray/coding/variables.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88

99
from ..core import dtypes, duck_array_ops, indexing
1010
from ..core.pycompat import dask_array_type
11-
from ..core.utils import equivalent
1211
from ..core.variable import Variable
1312

1413

@@ -152,18 +151,25 @@ def encode(self, variable, name=None):
152151
fv = encoding.get("_FillValue")
153152
mv = encoding.get("missing_value")
154153

155-
if fv is not None and mv is not None and not equivalent(fv, mv):
154+
if (
155+
fv is not None
156+
and mv is not None
157+
and not duck_array_ops.allclose_or_equiv(fv, mv)
158+
):
156159
raise ValueError(
157-
"Variable {!r} has multiple fill values {}. "
158-
"Cannot encode data. ".format(name, [fv, mv])
160+
f"Variable {name!r} has conflicting _FillValue ({fv}) and missing_value ({mv}). Cannot encode data."
159161
)
160162

161163
if fv is not None:
164+
# Ensure _FillValue is cast to same dtype as data's
165+
encoding["_FillValue"] = data.dtype.type(fv)
162166
fill_value = pop_to(encoding, attrs, "_FillValue", name=name)
163167
if not pd.isnull(fill_value):
164168
data = duck_array_ops.fillna(data, fill_value)
165169

166170
if mv is not None:
171+
# Ensure missing_value is cast to same dtype as data's
172+
encoding["missing_value"] = data.dtype.type(mv)
167173
fill_value = pop_to(encoding, attrs, "missing_value", name=name)
168174
if not pd.isnull(fill_value) and fv is None:
169175
data = duck_array_ops.fillna(data, fill_value)

xarray/core/common.py

Lines changed: 4 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,12 @@ def _reduce_method(cls, func: Callable, include_skipna: bool, numeric_only: bool
4343
if include_skipna:
4444

4545
def wrapped_func(self, dim=None, axis=None, skipna=None, **kwargs):
46-
return self.reduce(
47-
func, dim, axis, skipna=skipna, allow_lazy=True, **kwargs
48-
)
46+
return self.reduce(func, dim, axis, skipna=skipna, **kwargs)
4947

5048
else:
5149

5250
def wrapped_func(self, dim=None, axis=None, **kwargs): # type: ignore
53-
return self.reduce(func, dim, axis, allow_lazy=True, **kwargs)
51+
return self.reduce(func, dim, axis, **kwargs)
5452

5553
return wrapped_func
5654

@@ -83,20 +81,13 @@ def _reduce_method(cls, func: Callable, include_skipna: bool, numeric_only: bool
8381

8482
def wrapped_func(self, dim=None, skipna=None, **kwargs):
8583
return self.reduce(
86-
func,
87-
dim,
88-
skipna=skipna,
89-
numeric_only=numeric_only,
90-
allow_lazy=True,
91-
**kwargs,
84+
func, dim, skipna=skipna, numeric_only=numeric_only, **kwargs
9285
)
9386

9487
else:
9588

9689
def wrapped_func(self, dim=None, **kwargs): # type: ignore
97-
return self.reduce(
98-
func, dim, numeric_only=numeric_only, allow_lazy=True, **kwargs
99-
)
90+
return self.reduce(func, dim, numeric_only=numeric_only, **kwargs)
10091

10192
return wrapped_func
10293

xarray/core/dataarray.py

Lines changed: 57 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@
4848
assert_coordinate_consistent,
4949
remap_label_indexers,
5050
)
51-
from .dataset import Dataset, merge_indexes, split_indexes
51+
from .dataset import Dataset, split_indexes
5252
from .formatting import format_item
5353
from .indexes import Indexes, copy_indexes, default_indexes
5454
from .merge import PANDAS_TYPES, _extract_indexes_from_coords
@@ -249,14 +249,14 @@ class DataArray(AbstractArray, DataWithCoords):
249249
Dictionary for holding arbitrary metadata.
250250
"""
251251

252-
_accessors: Optional[Dict[str, Any]] # noqa
252+
_cache: Dict[str, Any]
253253
_coords: Dict[Any, Variable]
254254
_indexes: Optional[Dict[Hashable, pd.Index]]
255255
_name: Optional[Hashable]
256256
_variable: Variable
257257

258258
__slots__ = (
259-
"_accessors",
259+
"_cache",
260260
"_coords",
261261
"_file_obj",
262262
"_indexes",
@@ -376,7 +376,6 @@ def __init__(
376376
assert isinstance(coords, dict)
377377
self._coords = coords
378378
self._name = name
379-
self._accessors = None
380379

381380
# TODO(shoyer): document this argument, once it becomes part of the
382381
# public interface.
@@ -772,7 +771,9 @@ def reset_coords(
772771
return dataset
773772

774773
def __dask_tokenize__(self):
775-
return (type(self), self._variable, self._coords, self._name)
774+
from dask.base import normalize_token
775+
776+
return normalize_token((type(self), self._variable, self._coords, self._name))
776777

777778
def __dask_graph__(self):
778779
return self._to_temp_dataset().__dask_graph__()
@@ -1617,10 +1618,10 @@ def set_index(
16171618
--------
16181619
DataArray.reset_index
16191620
"""
1620-
_check_inplace(inplace)
1621-
indexes = either_dict_or_kwargs(indexes, indexes_kwargs, "set_index")
1622-
coords, _ = merge_indexes(indexes, self._coords, set(), append=append)
1623-
return self._replace(coords=coords)
1621+
ds = self._to_temp_dataset().set_index(
1622+
indexes, append=append, inplace=inplace, **indexes_kwargs
1623+
)
1624+
return self._from_temp_dataset(ds)
16241625

16251626
def reset_index(
16261627
self,
@@ -1743,7 +1744,9 @@ def stack(
17431744
return self._from_temp_dataset(ds)
17441745

17451746
def unstack(
1746-
self, dim: Union[Hashable, Sequence[Hashable], None] = None
1747+
self,
1748+
dim: Union[Hashable, Sequence[Hashable], None] = None,
1749+
fill_value: Any = dtypes.NA,
17471750
) -> "DataArray":
17481751
"""
17491752
Unstack existing dimensions corresponding to MultiIndexes into
@@ -1756,6 +1759,7 @@ def unstack(
17561759
dim : hashable or sequence of hashable, optional
17571760
Dimension(s) over which to unstack. By default unstacks all
17581761
MultiIndexes.
1762+
fill_value: value to be filled. By default, np.nan
17591763
17601764
Returns
17611765
-------
@@ -1787,7 +1791,7 @@ def unstack(
17871791
--------
17881792
DataArray.stack
17891793
"""
1790-
ds = self._to_temp_dataset().unstack(dim)
1794+
ds = self._to_temp_dataset().unstack(dim, fill_value)
17911795
return self._from_temp_dataset(ds)
17921796

17931797
def to_unstacked_dataset(self, dim, level=0):
@@ -2034,44 +2038,69 @@ def fillna(self, value: Any) -> "DataArray":
20342038

20352039
def interpolate_na(
20362040
self,
2037-
dim=None,
2041+
dim: Hashable = None,
20382042
method: str = "linear",
20392043
limit: int = None,
20402044
use_coordinate: Union[bool, str] = True,
2045+
max_gap: Union[int, float, str, pd.Timedelta, np.timedelta64] = None,
20412046
**kwargs: Any,
20422047
) -> "DataArray":
2043-
"""Interpolate values according to different methods.
2048+
"""Fill in NaNs by interpolating according to different methods.
20442049
20452050
Parameters
20462051
----------
20472052
dim : str
20482053
Specifies the dimension along which to interpolate.
2049-
method : {'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
2050-
'polynomial', 'barycentric', 'krog', 'pchip',
2051-
'spline', 'akima'}, optional
2054+
method : str, optional
20522055
String indicating which method to use for interpolation:
20532056
20542057
- 'linear': linear interpolation (Default). Additional keyword
2055-
arguments are passed to ``numpy.interp``
2056-
- 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
2057-
'polynomial': are passed to ``scipy.interpolate.interp1d``. If
2058-
method=='polynomial', the ``order`` keyword argument must also be
2058+
arguments are passed to :py:func:`numpy.interp`
2059+
- 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'polynomial':
2060+
are passed to :py:func:`scipy.interpolate.interp1d`. If
2061+
``method='polynomial'``, the ``order`` keyword argument must also be
20592062
provided.
2060-
- 'barycentric', 'krog', 'pchip', 'spline', and `akima`: use their
2061-
respective``scipy.interpolate`` classes.
2062-
use_coordinate : boolean or str, default True
2063+
- 'barycentric', 'krog', 'pchip', 'spline', 'akima': use their
2064+
respective :py:class:`scipy.interpolate` classes.
2065+
use_coordinate : bool, str, default True
20632066
Specifies which index to use as the x values in the interpolation
20642067
formulated as `y = f(x)`. If False, values are treated as if
2065-
eqaully-spaced along `dim`. If True, the IndexVariable `dim` is
2066-
used. If use_coordinate is a string, it specifies the name of a
2068+
eqaully-spaced along ``dim``. If True, the IndexVariable `dim` is
2069+
used. If ``use_coordinate`` is a string, it specifies the name of a
20672070
coordinate variariable to use as the index.
20682071
limit : int, default None
20692072
Maximum number of consecutive NaNs to fill. Must be greater than 0
2070-
or None for no limit.
2073+
or None for no limit. This filling is done regardless of the size of
2074+
the gap in the data. To only interpolate over gaps less than a given length,
2075+
see ``max_gap``.
2076+
max_gap: int, float, str, pandas.Timedelta, numpy.timedelta64, default None.
2077+
Maximum size of gap, a continuous sequence of NaNs, that will be filled.
2078+
Use None for no limit. When interpolating along a datetime64 dimension
2079+
and ``use_coordinate=True``, ``max_gap`` can be one of the following:
2080+
2081+
- a string that is valid input for pandas.to_timedelta
2082+
- a :py:class:`numpy.timedelta64` object
2083+
- a :py:class:`pandas.Timedelta` object
2084+
Otherwise, ``max_gap`` must be an int or a float. Use of ``max_gap`` with unlabeled
2085+
dimensions has not been implemented yet. Gap length is defined as the difference
2086+
between coordinate values at the first data point after a gap and the last value
2087+
before a gap. For gaps at the beginning (end), gap length is defined as the difference
2088+
between coordinate values at the first (last) valid data point and the first (last) NaN.
2089+
For example, consider::
2090+
2091+
<xarray.DataArray (x: 9)>
2092+
array([nan, nan, nan, 1., nan, nan, 4., nan, nan])
2093+
Coordinates:
2094+
* x (x) int64 0 1 2 3 4 5 6 7 8
2095+
2096+
The gap lengths are 3-0 = 3; 6-3 = 3; and 8-6 = 2 respectively
2097+
kwargs : dict, optional
2098+
parameters passed verbatim to the underlying interpolation function
20712099
20722100
Returns
20732101
-------
2074-
DataArray
2102+
interpolated: DataArray
2103+
Filled in DataArray.
20752104
20762105
See also
20772106
--------
@@ -2086,6 +2115,7 @@ def interpolate_na(
20862115
method=method,
20872116
limit=limit,
20882117
use_coordinate=use_coordinate,
2118+
max_gap=max_gap,
20892119
**kwargs,
20902120
)
20912121

0 commit comments

Comments
 (0)