Skip to content

Notebook for xarray.map_blocks lacks description of how chunks affect the computation #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

geopanda1
Copy link

@geopanda1 geopanda1 commented Jun 27, 2025

Hi,

I think the current example for xarray.map_blocks (advanced/map_blocks/simple_map_blocks.ipynb) has two issues:

  1. In the current configuration it just works, and I think it does so for the wrong reason (see explanation below)
  2. It falls a bit short of explaining how xarray.map_blocks actually passes bits of data the supplied function, which I think is paramount to correctly apply the function (I added an exercise to the notebook to illustrate that the function could return an unexpected result if the user doesn't take care of setting the chunks correctly)

What's the issue with the current example?

  • Opening the dataset sets chunks on the time dimension
  • time_mean(obj) actually computes the mean along the lat dimension (based on the function name and chunks on the time dimension, I guess this was not the intention (?) )
  • The comparison with .identical(...) returns True, suggesting that everything worked fine and that this has to do with how the chunks on the time dimension are set in the current example:
ds = xr.tutorial.open_dataset("air_temperature", chunks={"time": 100})

...

def time_mean(obj):
    # use xarray's convenient API here
    # you could convert to a pandas dataframe and use pandas' extensive API
    # or use .plot() and plt.savefig to save visualizations to disk in parallel.
    return obj.mean("lat")

# this will calculate values and will return True if the computation works as expected
ds.map_blocks(time_mean).identical(ds.mean("lat"))

The problem I see is: The comparison with .identical(...) works because actually the mean is computed along the lat dimension and the lat dimension is not chunked at all (more or less by chance), but it has nothing to do with setting the chunks of the time dimension when opening the dataset. Instead, when really computing the mean along the time dimension both in time_mean(obj) and via xarrays built-in .mean("time") and keeping chunks={"time": 100}, the comparison via .identical(...) fails exactly because the time dimension is chunked. I suggest to fix the notebook so it follows the original intention of computing the mean along the time dimension.

I have tried to update the notebook accordingly and add an exercise (now Exercise 1) to illustrate the importance of taking care of the chunks.

All the best
Andreas

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@geopanda1 geopanda1 changed the title updated notebook with proper example and new exercise Notebook for xarray.map_blocks lacks description of how chunks affect the computation Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant