feat: reindex multiple DataArrays

When e.g. creating a `Dataset` from multiple `DataArray`s that are supposed to share the same grid, but are not exactly aligned (as is often the case with floating point coordinates), we usually end up with undesirable `NaN`s inserted in the data set.
For instance, consider the following data arrays that are not exactly aligned:
```python
import xarray as xr

da1 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[0, 1, 2], [0, 1, 2]], dims=['x', 'y']).rename('da1')
da2 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[1.1, 2.1, 3.1], [1.1, 2.1, 3.1]], dims=['x', 'y']).rename('da2')
da1.plot.imshow()
da2.plot.imshow()
```
![image](https://user-images.githubusercontent.com/4711805/103482830-542bbe80-4de3-11eb-814b-bb1f705967c4.png) ![image](https://user-images.githubusercontent.com/4711805/103482836-61e14400-4de3-11eb-804b-f549c2551562.png)
They show gaps when combined in a data set:
```python
ds = xr.Dataset({'da1': da1, 'da2': da2})
ds['da1'].plot.imshow()
ds['da2'].plot.imshow()
```
![image](https://user-images.githubusercontent.com/4711805/103482959-3f9bf600-4de4-11eb-9513-900319cb485a.png) ![image](https://user-images.githubusercontent.com/4711805/103482966-47f43100-4de4-11eb-853b-2b44f7bc8d7f.png)
I think this is a frequent enough situation that we would like a function to re-align all the data arrays together. There is a `reindex_like` method, which accepts a tolerance, but calling it successively on every data array, like so:
```python
da1r = da1.reindex_like(da2, method='nearest', tolerance=0.2)
da2r = da2.reindex_like(da1r, method='nearest', tolerance=0.2)
```
would result in the intersection of the coordinates, rather than their union. What I would like is a function like the following:

```python
import numpy as np
from functools import reduce

def reindex_all(arrays, dims, tolerance):
    coords = {}
    for dim in dims:
        coord = reduce(np.union1d, [array[dim] for array in arrays[1:]], arrays[0][dim])
        diff = coord[:-1] - coord[1:]
        keep = np.abs(diff) > tolerance
        coords[dim] = np.append(coord[:-1][keep], coord[-1])
    reindexed = [array.reindex(coords, method='nearest', tolerance=tolerance) for array in arrays]
    return reindexed

da1r, da2r = reindex_all([da1, da2], ['x', 'y'], 0.2)
dsr = xr.Dataset({'da1': da1r, 'da2': da2r})
dsr['da1'].plot.imshow()
dsr['da2'].plot.imshow()
```
![image](https://user-images.githubusercontent.com/4711805/103483065-00ba7000-4de5-11eb-8581-fb156970a7e8.png) ![image](https://user-images.githubusercontent.com/4711805/103483072-0748e780-4de5-11eb-8b42-6bd9b248ab1e.png)
I have not found something equivalent. If you think this is worth it, I could try and send a PR to implement such a feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: reindex multiple DataArrays #4756

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

feat: reindex multiple DataArrays #4756

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions