Skip to content

feat: reindex multiple DataArrays #4756

Open
@davidbrochart

Description

@davidbrochart

When e.g. creating a Dataset from multiple DataArrays that are supposed to share the same grid, but are not exactly aligned (as is often the case with floating point coordinates), we usually end up with undesirable NaNs inserted in the data set.
For instance, consider the following data arrays that are not exactly aligned:

import xarray as xr

da1 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[0, 1, 2], [0, 1, 2]], dims=['x', 'y']).rename('da1')
da2 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[1.1, 2.1, 3.1], [1.1, 2.1, 3.1]], dims=['x', 'y']).rename('da2')
da1.plot.imshow()
da2.plot.imshow()

image image
They show gaps when combined in a data set:

ds = xr.Dataset({'da1': da1, 'da2': da2})
ds['da1'].plot.imshow()
ds['da2'].plot.imshow()

image image
I think this is a frequent enough situation that we would like a function to re-align all the data arrays together. There is a reindex_like method, which accepts a tolerance, but calling it successively on every data array, like so:

da1r = da1.reindex_like(da2, method='nearest', tolerance=0.2)
da2r = da2.reindex_like(da1r, method='nearest', tolerance=0.2)

would result in the intersection of the coordinates, rather than their union. What I would like is a function like the following:

import numpy as np
from functools import reduce

def reindex_all(arrays, dims, tolerance):
    coords = {}
    for dim in dims:
        coord = reduce(np.union1d, [array[dim] for array in arrays[1:]], arrays[0][dim])
        diff = coord[:-1] - coord[1:]
        keep = np.abs(diff) > tolerance
        coords[dim] = np.append(coord[:-1][keep], coord[-1])
    reindexed = [array.reindex(coords, method='nearest', tolerance=tolerance) for array in arrays]
    return reindexed

da1r, da2r = reindex_all([da1, da2], ['x', 'y'], 0.2)
dsr = xr.Dataset({'da1': da1r, 'da2': da2r})
dsr['da1'].plot.imshow()
dsr['da2'].plot.imshow()

image image
I have not found something equivalent. If you think this is worth it, I could try and send a PR to implement such a feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions