Description
I noticed that with multiple conflicting dimension coords then concat can give pretty weird/counterintuitive results, at least compared to what the documentation suggests they should give:
# Create two datasets with conflicting coordinates
objs = [Dataset({'x': [0], 'y': [1]}), Dataset({'y': [0], 'x': [1]})]
[<xarray.Dataset>
Dimensions: (x: 1, y: 1)
Coordinates:
* x (x) int64 0
* y (y) int64 1
Data variables:
*empty*,
<xarray.Dataset>
Dimensions: (x: 1, y: 1)
Coordinates:
* y (y) int64 0
* x (x) int64 1
Data variables:
*empty*]
# Try to join along only 'x',
# coords='minimal' so concatenate "Only coordinates in which the dimension already appears"
concat(objs, dim='x', coords='minimal')
<xarray.Dataset>
Dimensions: (x: 2, y: 2)
Coordinates:
* y (y) int64 0 1
* x (x) int64 0 1
Data variables:
*empty*
# It's joined along x and y!
Based on my reading of the docstring for concat, I would have expected this to not attempt to concatenate y, because coords='minimal'
, and instead to throw an error because 'y' is a "non-concatenated variable" whose values are not the same across datasets.
Now let's try to get concat to broadcast 'y' across 'x':
# Try to join along only 'x' by setting coords='different'
concat(objs, dim='x', coords='different')
Now as "Data variables which are not equal (ignoring attributes) across all datasets are also concatenated" then I would have expected 'y' to be concatenated across 'x', i.e. to add the 'x' dimension to the 'y' coord, i.e:
<xarray.Dataset>
Dimensions: (x: 2, y: 1)
Coordinates:
* y (y, x) int64 1 0
* x (x) int64 0 1
Data variables:
*empty*
But that's not what we get!:
<xarray.Dataset>
Dimensions: (x: 2, y: 2)
Coordinates:
* y (y) int64 0 1
* x (x) int64 0 1
Data variables:
*empty*
Same again but without dimension coords
If we create the same sort of objects but the variables are data vars not coords, then everything behaves exactly as expected:
objs2 = [Dataset({'a': ('x', [0]), 'b': ('y', [1])}), Dataset({'a': ('x', [1]), 'b': ('y', [0])})]
[<xarray.Dataset>
Dimensions: (x: 1, y: 1)
Dimensions without coordinates: x, y
Data variables:
a (x) int64 0
b (y) int64 1,
<xarray.Dataset>
Dimensions: (x: 1, y: 1)
Dimensions without coordinates: x, y
Data variables:
a (x) int64 1
b (y) int64 0]
concat(objs2, dim='x', data_vars='minimal')
ValueError: variable b not equal across datasets
concat(objs2, dim='x', data_vars='different')
<xarray.Dataset>
Dimensions: (x: 2, y: 1)
Dimensions without coordinates: x, y
Data variables:
a (x) int64 0 1
b (x, y) int64 1 0
Also if you do the same again but with coordinates which are not dimension coords, i.e:
objs3 = [Dataset(coords={'a': ('x', [0]), 'b': ('y', [1])}), Dataset(coords={'a': ('x', [1]), 'b': ('y', [0])})]
[<xarray.Dataset>
Dimensions: (x: 1, y: 1)
Coordinates:
a (x) int64 0
b (y) int64 1
Dimensions without coordinates: x, y
Data variables:
*empty*,
<xarray.Dataset>
Dimensions: (x: 1, y: 1)
Coordinates:
a (x) int64 1
b (y) int64 0
Dimensions without coordinates: x, y
Data variables:
*empty*]
then this again gives the expected concatenation behaviour.
So this implies that the compatibility checks that are being done on the data vars are not being done on the coords, but only if they are dimension coordinates!
Either this is not the desired behaviour or the concat docstring needs to be a lot clearer. If we agree that this is not the desired behaviour then I will have a look inside concat
to work out why it's happening.
EDIT: Presumably this has something to do with the ToDo in the code for concat
: # TODO: support concatenating scalar coordinates even if the concatenated dimension already exists
...