Skip to content
This repository was archived by the owner on Oct 24, 2024. It is now read-only.
This repository was archived by the owner on Oct 24, 2024. It is now read-only.

Use path-like tools with DataTree #281

Closed
@eschalkargans

Description

@eschalkargans

Hello,

I open this issue to discuss about integrating more path manipulations into the datatree library.

TLDR:

To summarize:

  • pathlib-like methods for DataTree
  • Indexing with Path objects
  • To what extent is it possible to delegate to pathlib some path operations

Details

This is a suggestion regarding DataTrees

Since a DataTree can be accessed with string representing unix-like paths (slash-separated strings), and since there is the pathlib library in Python, I wondered if the DataTree could comply with some of the "pathlib API". It's especially true for Zarr files that are persisted to literal directories into the filesystem. It seems natural to have all the tools for Path manipulation inside a DataTree.

For instance, with pathlib we can glob or rglob on a directory path. Would it be possible to glob and rglob in DataTrees too? Currently there is a match function, but it introduces a new word, "match". dt.match("*/B") or dt.glob("*/B") would be okay, or dt.glob('**/B') or dt.rglob('B').

The idea is that the, glob and rglob are already "standard" in the pathlib library, and well-known by Python developers using pathlib.

Also, we can imagine, to keep the path aspect separated from the datatree API, to index a datatree with a Path: dt[Path('/a/b/c')]. Or, equivalent to dt.rglob('B'), dt[Path('/').rglob('B')]. I don't know how much pathlib is tied to the underlying file-system. We could imagine being able to use it from a "virtual filesystem hierarchy" provided by the DataTree itself:

dt_path: DataTreePath = dt.path()
for path in dt_path.rglob('B'):
    print(path)

(Note: i saw that the path property already exists on the DataTree object.)

The current way to iterate over all nodes is using .subtree

for node in vertebrates.subtree:
    print(node.path)

For a python developer discovering datatree but already knowing well how to work with paths, we could imagine:

for path, node in dt.rglob('*').items():
    ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions