Skip to content

DataArray.rolling() does not preserve chunksizes in some cases #2531

Closed
@cchwala

Description

@cchwala

This issue was found and discussed in the related issue #2514

I open a separate issue for clarity.

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
import xarray as xr

t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H')
bar = np.sin(np.arange(len(t)))
baz = np.cos(np.arange(len(t)))

da_test = xr.DataArray(data=np.stack([bar, baz]),
                       coords={'time': t,
                               'sensor': ['one', 'two']},
                       dims=('sensor', 'time'))

print(da_test.chunk({'time': 100}).rolling(time=60).mean().chunks)

print(da_test.chunk({'time': 100}).rolling(time=60).count().chunks)
Output for `mean`: ((2,), (745,))
Output for `count`: ((2,), (100, 100, 100, 100, 100, 100, 100, 45))
Desired Output: ((2,), (100, 100, 100, 100, 100, 100, 100, 45))

Problem description

DataArray.rolling() does not preserve the chunksizes, apparently depending on the applied method.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.15.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: None.None

xarray: 0.10.9
pandas: 0.23.3
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.4.1
h5netcdf: 0.5.0
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: 1.0.0
dask: 0.19.4
distributed: 1.23.3
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 38.5.2
pip: 9.0.1
conda: 4.5.11
pytest: 3.4.2
IPython: 5.5.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions