Description
When e.g. creating a Dataset
from multiple DataArray
s that are supposed to share the same grid, but are not exactly aligned (as is often the case with floating point coordinates), we usually end up with undesirable NaN
s inserted in the data set.
For instance, consider the following data arrays that are not exactly aligned:
import xarray as xr
da1 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[0, 1, 2], [0, 1, 2]], dims=['x', 'y']).rename('da1')
da2 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[1.1, 2.1, 3.1], [1.1, 2.1, 3.1]], dims=['x', 'y']).rename('da2')
da1.plot.imshow()
da2.plot.imshow()
They show gaps when combined in a data set:
ds = xr.Dataset({'da1': da1, 'da2': da2})
ds['da1'].plot.imshow()
ds['da2'].plot.imshow()
I think this is a frequent enough situation that we would like a function to re-align all the data arrays together. There is a reindex_like
method, which accepts a tolerance, but calling it successively on every data array, like so:
da1r = da1.reindex_like(da2, method='nearest', tolerance=0.2)
da2r = da2.reindex_like(da1r, method='nearest', tolerance=0.2)
would result in the intersection of the coordinates, rather than their union. What I would like is a function like the following:
import numpy as np
from functools import reduce
def reindex_all(arrays, dims, tolerance):
coords = {}
for dim in dims:
coord = reduce(np.union1d, [array[dim] for array in arrays[1:]], arrays[0][dim])
diff = coord[:-1] - coord[1:]
keep = np.abs(diff) > tolerance
coords[dim] = np.append(coord[:-1][keep], coord[-1])
reindexed = [array.reindex(coords, method='nearest', tolerance=tolerance) for array in arrays]
return reindexed
da1r, da2r = reindex_all([da1, da2], ['x', 'y'], 0.2)
dsr = xr.Dataset({'da1': da1r, 'da2': da2r})
dsr['da1'].plot.imshow()
dsr['da2'].plot.imshow()
I have not found something equivalent. If you think this is worth it, I could try and send a PR to implement such a feature.