Description
I use this stack, groupby, unstack quite frequently. e.g. here
An issue I have is that after groupby('allpoints').apply()
, the coordinate names do not get carried through. i.e. the coordinate names are now: allpoints_level_0
and allpoints_level_1
. Then after unstacking
I rename them back to lat/lon etc. Do you ever encounter this?
Is there a way to carry them through and is this an issue for others?
import xarray as xr
import numpy as np
ds = xr.DataArray(np.ndarray((180,360,2000)), coords={'lat':np.arange(90,-90,-1), 'lon':np.arange(-180,180), 'time':range(2000)})
ds
<xarray.DataArray (lat: 180, lon: 360, time: 2000)>
array([[[ 0.623891, -0.044304, ..., 1.015785, 0.009088],
[-0.7375 , 0.380369, ..., 0.788351, -0.69295 ],
...,
[ 0.171894, 0.517164, ..., -0.946908, -0.597802],
[ 0.353743, 0.005539, ..., -1.436965, -0.190099]],
....
Coordinates:
* lat (lat) int32 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 ...
* lon (lon) int32 -180 -179 -178 -177 -176 -175 -174 -173 -172 -171 ...
* time (time) int32 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ..
Now we stack the data by allpoints. Note that the info about original coordinates (lat / lon) is still there...
dst = ds.stack(allpoints=['lat','lon'])
<xarray.DataArray (time: 2000, allpoints: 64800)>
array([[ 0.623891, -0.7375 , 0.053525, ..., 0.379701, 0.130618, 0.11094 ],
[-0.044304, 0.380369, -0.410632, ..., -0.739881, 0.203219, -0.506303],
[-1.762024, -1.019424, 2.580218, ..., 1.491677, 1.189149, -0.072223],
...,
[-0.896298, 0.333163, -1.751641, ..., 1.90315 , 2.642813, -0.913787],
[ 1.015785, 0.788351, 0.379997, ..., 0.864934, 0.889001, -1.363458],
[ 0.009088, -0.69295 , -1.276184, ..., 1.220656, 0.895599, 0.848757]])
Coordinates:
* time (time) int32 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* allpoints (allpoints) MultiIndex
- lat (allpoints) int64 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ...
- lon (allpoints) int64 -180 -179 -178 -177 -176 -175 -174 -173 ...
Now apply groupby().apply()
dsg=dst.groupby('allpoints').apply(my_custom_function)
<xarray.DataArray (allpoints: 64800)>
array([ 0.013697, 0.006272, 0.009744, ..., -0.016265, -0.002108, -0.014733])
Coordinates:
* allpoints (allpoints) MultiIndex
- allpoints_level_0 (allpoints) int64 -89 -89 -89 -89 -89 -89 -89 -89 -89 ...
- allpoints_level_1 (allpoints) int64 -180 -179 -178 -177 -176 -175 -174 ...
So now we have lost the 'lat','lon'
. However if we skip the groupby part and go straight to unstack
, this would be carried through.
dst.unstack('allpoints')
<xarray.DataArray (time: 2000, lat: 180, lon: 360)>
array([[[ 0.623891, -0.7375 , ..., 0.171894, 0.353743],
[ 1.780691, -0.747431, ..., 0.038754, 0.615228],
...,
Coordinates:
* time (time) int32 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
* lat (lat) int64 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 ...
* lon (lon) int64 -180 -179 -178 -177 -176 -175 -174 -173 -172 -171 ...