Description
EDIT: see #4118 for ongoing discussion
Probably it has been already suggested, but similarly to netCDF4 groups it would be nice if we could access Dataset
data variables, coordinates and attributes via groups.
Currently xarray allows loading a specific netCDF4 group into a Dataset
. Different groups can be loaded as separate Dataset
objects, which may be then combined into a single, flat Dataset
. Yet, in some cases it makes sense to represent data as a single object while it would be convenient to keep some nested structure. For example, a Dataset
representing data on a staggered grid might have scalar_vars
and flux_vars
groups. Here are some potential uses for groups. When there are a lot of data variables and/or attributes, it would also help to have a more concise repr.
I think about an implementation of Dataset.groups
that would be specific to xarray, i.e., independent of any backend, and which would easily co-exist with the flat Dataset
. It shouldn't be required for a backend to support groups (some existing backends simply don't). It is up to each backend to eventually transpose the Dataset.groups
logic to its own group logic.
Dataset.groups
might return a DatasetGroups
object, which quite similarly to xarray.core.coordinates.DatasetCoordinates
would (1) have a reference to the Dataset object, (2) basically consist of a Mapping of group names to data variable/coordinate/attribute names and (3) dynamically create another Dataset
object (sub-dataset) on __getitem__
. Keys of Dataset.groups
should be accessible as attributes , e.g., ds.groups['scalar_vars'] == ds.scalar_vars
.
Questions:
- How to handle hierarchies of > 1 levels (i.e., groups of groups...)?
- How to ensure that a variable / attribute in one group is not also present in another group?
- Case of methods called from groups with
inplace=True
?