Skip to content

Support specifying chunk sizes using labels (e.g. frequency string) #7559

Closed
@dcherian

Description

@dcherian

Is your feature request related to a problem?

dask.dataframe supports repartitioning or rechunking using a frequency string (freq kwarg).

I think this would be a useful addition to .chunk. It would help with some groupby problems (as suggested in this comment) and generally make a few problems amenable to blockwise/map_blocks solutions.

Describe the solution you'd like

  1. One solution is to allow .chunk(lon=5, time="MS"). There is some ugliness in that this syntax mixes up integer index values (lon=5) and a label-based frequency string time="MS"
  2. So perhaps a second method chunk_by_labels would be useful where chunk_by_labels(lon=5, time="MS") would rechunk the data so that a single chunk contains 5° of longitude points and a month of time. Alternative this could be .chunk(lon=5, time="MS", by="labels")

Describe alternatives you've considered

Have the user do this manually but that's kind of annoying, and a bit advanced.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions