Skip to content

dimensions: type as str | Iterable[Hashable]? #6142

Closed
@mathause

Description

@mathause

What happened?

We generally type dimensions as:

dims: Hashable | Iterable[Hashable]

However, this is in conflict with passing a tuple of independent dimensions to a method - e.g. da.mean(("x", "y")) because a tuple is also hashable.

Also mypy requires an isinstance(dims, Hashable) check when typing a function. We use an isinstance(dims, str) check in many places to wrap a single dimension in a list. Changing this to isinstance(dims, Hashable) will change the behavior for tuples.

What did you expect to happen?

In the community call today we discussed to change this to

dims: str | Iterable[Hashable]

i.e. if a single dim is passed it has to be a string and wrapping it in a list is a convenience function. Special use cases with Hashable types should be wrapped in a Iterable by the user. This probably best reflects the current state of the repo (dims = [dims] if isinstance(dims, str) else dims).

The disadvantage could be that it is a bit more difficult to explain in the docstrings?

@shoyer - did I get this right from the discussion?


Other options

  1. Require str as dimension names.

This could be too restrictive. @keewis mentioned that tuple dimension names are already used somwehere in the xarray repo. Also we discussed in another issue or PR (which I cannot find right know) that we want to keep allowing Hashable.

  1. Disallow passing tuples (only allow tuples if a dimension is a tuple), require lists to pass several dimensions.

This is too restrictive in the other direction and will probably lead to a lot of downstream troubles. Naming a single dimension with a tuple will be a very rare case, in contrast to passing several dimension names as a tuple.

  1. Special case tuples. We could potentially check if dims is a tuple and if there are any dimension names consisting of a tuple. Seems more complicated and potentially brittle for probably small gains (IMO).

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

  • We need to check carefully where general Hashable are really allowed. E.g. dims of a DataArray are typed as

dims: Union[Hashable, Sequence[Hashable], None] = None,

but tuples are not actually allowed:

import xarray as xr
xr.DataArray([1], dims=("x", "y"))
# ValueError: different number of dimensions on data and dims: 1 vs 2
xr.DataArray([1], dims=[("x", "y")])
# TypeError: dimension ('x', 'y') is not a string
  • We need to be careful typing functions where only one dim is allowed, e.g. xr.concat, which should probably set dim: Hashable (and make sure it works).
  • Do you have examples for other real-world hashable types except for str and tuple? (Would be good for testing purposes).

Environment

N/A

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions