Description
Code Sample, a copy-pastable example if possible
From @RayPalmerTech in pydata/bottleneck#186:
import numpy as np
import pandas as pd
import time
import bottleneck as bn
import xarray
import matplotlib.pyplot as plt
N = 30000200 # Number of datapoints
Fs = 30000 # sample rate
T=1/Fs # sample period
duration = N/Fs # duration in s
t = np.arange(0,duration,T) # time vector
DATA = np.random.randn(N,)+5*np.sin(2*np.pi*0.01*t) # Example noisy sine data and window size
w = 330000
def using_bottleneck_mean(data,width):
return bn.move_mean(a=data,window=width,min_count = 1)
def using_pandas_rolling_mean(data,width):
return np.asarray(pd.DataFrame(data).rolling(window=width,center=True,min_periods=1).mean()).ravel()
def using_xarray_mean(data,width):
return xarray.DataArray(data,dims='x').rolling(x=width,min_periods=1, center=True).mean()
start=time.time()
A = using_bottleneck_mean(DATA,w)
print('Bottleneck: ', time.time()-start, 's')
start=time.time()
B = using_pandas_rolling_mean(DATA,w)
print('Pandas: ',time.time()-start,'s')
start=time.time()
C = using_xarray_mean(DATA,w)
print('Xarray: ',time.time()-start,'s')
This results in:
Bottleneck: 0.0867006778717041 s
Pandas: 0.563546895980835 s
Xarray: 25.133142709732056 s
Somehow xarray is way slower than pandas and bottleneck, even though it's using bottleneck under the hood!
Problem description
Profiling shows that the majority of time is spent in xarray.core.rolling.DataArrayRolling._setup_windows
. Monkey-patching that method with a dummy rectifies the issue:
xarray.core.rolling.DataArrayRolling._setup_windows = lambda *args: None
Now we obtain:
Bottleneck: 0.06775331497192383 s
Pandas: 0.48262882232666016 s
Xarray: 0.1723031997680664 s
The solution is to make setting up windows done lazily (in __iter__
), instead of doing it in the constructor.
Output of xr.show_versions()
xarray: 0.10.2
pandas: 0.22.0
numpy: 1.14.2
scipy: 0.19.1
netCDF4: None
h5netcdf: None
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: None
distributed: None
matplotlib: 2.1.2
cartopy: None
seaborn: 0.7.1
setuptools: 36.2.7
pip: 9.0.1
conda: None
pytest: None
IPython: 5.5.0
sphinx: None