Description
What happened?
We generally type dimensions as:
dims: Hashable | Iterable[Hashable]
However, this is in conflict with passing a tuple of independent dimensions to a method - e.g. da.mean(("x", "y"))
because a tuple is also hashable.
Also mypy requires an isinstance(dims, Hashable)
check when typing a function. We use an isinstance(dims, str)
check in many places to wrap a single dimension in a list. Changing this to isinstance(dims, Hashable)
will change the behavior for tuples.
What did you expect to happen?
In the community call today we discussed to change this to
dims: str | Iterable[Hashable]
i.e. if a single dim is passed it has to be a string and wrapping it in a list is a convenience function. Special use cases with Hashable
types should be wrapped in a Iterable
by the user. This probably best reflects the current state of the repo (dims = [dims] if isinstance(dims, str) else dims
).
The disadvantage could be that it is a bit more difficult to explain in the docstrings?
@shoyer - did I get this right from the discussion?
Other options
- Require
str
as dimension names.
This could be too restrictive. @keewis mentioned that tuple dimension names are already used somwehere in the xarray repo. Also we discussed in another issue or PR (which I cannot find right know) that we want to keep allowing Hashable
.
- Disallow passing tuples (only allow tuples if a dimension is a tuple), require lists to pass several dimensions.
This is too restrictive in the other direction and will probably lead to a lot of downstream troubles. Naming a single dimension with a tuple will be a very rare case, in contrast to passing several dimension names as a tuple.
- Special case tuples. We could potentially check if
dims
is a tuple and if there are any dimension names consisting of a tuple. Seems more complicated and potentially brittle for probably small gains (IMO).
Minimal Complete Verifiable Example
No response
Relevant log output
No response
Anything else we need to know?
- We need to check carefully where general
Hashable
are really allowed. E.g.dims
of aDataArray
are typed as
xarray/xarray/core/dataarray.py
Line 380 in e056cac
but tuples are not actually allowed:
import xarray as xr
xr.DataArray([1], dims=("x", "y"))
# ValueError: different number of dimensions on data and dims: 1 vs 2
xr.DataArray([1], dims=[("x", "y")])
# TypeError: dimension ('x', 'y') is not a string
- We need to be careful typing functions where only one dim is allowed, e.g.
xr.concat
, which should probably setdim: Hashable
(and make sure it works). - Do you have examples for other real-world hashable types except for
str
andtuple
? (Would be good for testing purposes).
Environment
N/A