Skip to content

ENH: Plotting for groupby_bins #2152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Oct 23, 2018
Merged

ENH: Plotting for groupby_bins #2152

merged 28 commits into from
Oct 23, 2018

Conversation

maahn
Copy link

@maahn maahn commented May 17, 2018

DataArrays created with e.g. groupy_bins have coords arrays consisting of pd._libs.interval.Interval. Therefore, they cannot be plotted. The small patch replaces the the pd._libs.interval.Interval values with the interval's center point and adds _center to the label name. It looks like this: https://gist.github.com/maahn/91da0a8d299ef6567827749cbe2f1913
I don't think there is any need for additional documentation except the whats-new.rst or tests(?), but I'm also happy to add them if you think it is required.

  • Tests added (for all bug fixes or enhancements)
  • Tests passed (for all non-documentation changes)
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

DataArrays created with e.g. groupy_bins have coords containing of pd._libs.interval.Interval. For plotting, the pd._libs.interval.Interval is replaced with the interval's center point. '_center' is appended to teh label
@rabernat
Copy link
Contributor

This seems like a good idea.

groupby_bins has a labels option, which can override the default labels generated by pandas. That's what is done in the (multidimensional groupby example](http://xarray.pydata.org/en/stable/examples/multidimensional-coords.html#multidimensional-groupby).

I wonder if it's worth just making this behavior the default, or generating an additional coordinate for the bin centers automatically.

@maahn
Copy link
Author

maahn commented May 22, 2018

Thanks, for the comment, but I wouldn't use center labels when using groupby_bins by default, that could be misleading in case of non-uniform (e.g. exponential) bin spacing. I guess that's acceptable for a plot, but not for a DataArray, because the information about the boundaries would be lost. And from a user perspective, I would find it a bit confusing if an additional coordinate would show up when using groupby_bins.

with the Intervals' mid points.In addition, _center is
appended to the label
"""
if _valid_other_type(array, [pd._libs.interval.Interval]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use pd.Interval directly, which is how this is exposed as public API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed.

@@ -267,6 +280,8 @@ def line(darray, *args, **kwargs):

_ensure_plottable(xplt)

xplt.values, xlabel = _interval_to_mid_points(xplt.values, xlabel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than mutating xplt.values, please assign a new variable. Otherwise this could mutate an inline argument to this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, changed it here and in lines 628f. Note that this changes the type of xplt passed to ax.plot from DataArray to np.array, but I think this shouldn't matter.

Maximilian Maahn added 2 commits May 23, 2018 08:46
…original variable.

Note that this changes the the type of  xplt from DataArray to np.array in the line function.
@shoyer
Copy link
Member

shoyer commented May 23, 2018

A couple of other thoughts:

  1. For 2D plots that already show regions (imshow/pcolormesh), I'm not sure it makes sense to update the label to include the word "center". This plot around shows intervals pretty clearly:
    image
  2. This needs tests, at least something to make sure that plotting with an Intervals on an axis does not crash in xarray/tests/test_plot.py

'mean', 'prod', 'sum',
'std', 'var', 'median']:
gp = self.darray.groupby_bins(dim, [-1, 0, 1, 2])
getattr(gp, method)().plot.hist(range=(-1,2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E231 missing whitespace after ','

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few comments on your tests -- I think they can be simplified

for dim in self.darray.dims:
for method in ['argmax', 'argmin', 'max', 'min',
'mean', 'prod', 'sum',
'std', 'var', 'median']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to test all these different methods here. They all use the same logic internally, so just one groupby method should be enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will use mean only.

@@ -297,6 +297,19 @@ def test_convenient_facetgrid_4d(self):
with raises_regex(ValueError, '[Ff]acet'):
d.plot(x='x', y='y', col='columns', ax=plt.gca())

def test_coord_with_interval(self):
for dim in self.darray.dims:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the point of testing multiple dimensions is -- do you expect different behavior for different dimensions? If not, I would probably just pick one dimension.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, that was basically a copy paste error from the 2d version. Will change that.

primitive = ax.plot(xplt, yplt, *args, **kwargs)
# Remove pd.Intervals if contained in xplt.values.
if _valid_other_type(xplt.values, [pd.Interval]):
xplt_val = _interval_to_mid_points(xplt.values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to plot labels like [0, 10) instead of 5. But this is certainly an improvement over the current state of things, so I would be happy to potentially revise this later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I guess in many case there is not enough space for all tick labels. And labeling only some intervals might be confusing? Maybe something like a step plot would be an alternative? https://matplotlib.org/gallery/lines_bars_and_markers/step_demo.html
But is that always desired?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is probably better default behavior. Potentially there could be a flag to choose.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also remove the loops over dims / groupby methods in the other tests?

(Plotting tests are kind of slow due to matplotlib, so we try to be a little more careful than usual to only add those that are necessary.)

@@ -975,13 +972,9 @@ def test_cmap_and_color_both(self):

def test_2d_coord_with_interval(self):
for dim in self.darray.dims:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the loop here, because for the 2d plots, x and y axis are treated separately.

Maximilian Maahn added 2 commits May 29, 2018 11:42
New bool keyword `interval_step_plot` to turn it off.
@maahn
Copy link
Author

maahn commented May 29, 2018

Ok, I added the interval_step_plot kwarg (default True) to the 1D line plot to make a step plot with the real (i.e. not interpolated) boundaries. After that I had the feeling that it's inconsistent if line uses the real boundaries but pcolormesh doesn't. So I also patched that, but I had to disable infer_intervals for these cases. See also:
https://gist.github.com/maahn/91da0a8d299ef6567827749cbe2f1913/530ed7bf77c9c3257cab46c3894c1412085a217a

Let me know what you think and I will also add some documentation.

@shoyer
Copy link
Member

shoyer commented May 30, 2018

Rather than using interval_step_plot=True, what about making an entirely new plot method, plot.step()? That would feel a little cleaner to me.

I'm not sure that a step plot would be preferred by most users for a plot over an axis on which groupby_bins was applied. Personally, I would probably prefer a line + scatter plot (e.g., marker='s'). Maybe @rabernat (groupby_bins author) has opinions here?

xplt_val, yplt_val = _interval_to_double_bound_points(xplt.values,
yplt.values)
# just to be sure that matplotlib is not confused
kwargs['linestyle'] = kwargs['linestyle'].replace(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is quite ugly, but does it make sense to import re only for this one line?

@maahn
Copy link
Author

maahn commented Jun 5, 2018

Good idea, it turned out the step function was quite easy to implement by using https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py#L1735
So now, 1D data defaults to the standard line plot, but you can use plot.step() instead. I guess changing the default plot to something more sophisticated would make the simple logic in plot() to determine the default quite complex?

See https://gist.github.com/maahn/91da0a8d299ef6567827749cbe2f1913

with the Intervals' mid points.
"""

return np.asarray(list(map(lambda x: x.mid, array)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider writing these with list comprehensions, e.g., np.array([x.mid for x in array])

xarray1 = np.asarray(list(map(lambda x: x.left, xarray)))
xarray2 = np.asarray(list(map(lambda x: x.right, xarray)))

xarray = ([x for x in itertools.chain.from_iterable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be list(itertools.chain.from_iterable(zip(xarray1, xarray2)))

ylab_extra = '_center'
else:
yplt = yval
ylab_extra = ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this logic in a helper function? This function is already getting pretty long :)

@maahn
Copy link
Author

maahn commented Jun 8, 2018

Thanks for the review.

doc/plotting.rst Outdated
Step plots
~~~~~~~~~~

As an alternative, also a step lot similar to matplotlib's ``plt.step`` can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"step lot" should be step plot?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, thanks!

@maahn
Copy link
Author

maahn commented Aug 9, 2018

Sorry, for the delay, I finally merged upstream. Looks like the failed builds are unrelated to my changes, so it should be ready for merging?

dcherian added 4 commits October 10, 2018 15:30
* master: (51 commits)
  xarray.backends refactor (pydata#2261)
  Fix indexing error for data loaded with open_rasterio (pydata#2456)
  Properly support user-provided norm. (pydata#2443)
  pep8speaks (pydata#2462)
  isort (pydata#2469)
  tests shoudn't need to pass for a PR (pydata#2471)
  Replace the last of unittest with pytest (pydata#2467)
  Add python_requires to setup.py (pydata#2465)
  Update whats-new.rst (pydata#2466)
  Clean up _parse_array_of_cftime_strings (pydata#2464)
  plot.contour: Don't make cmap if colors is a single color. (pydata#2453)
  np.AxisError was added in numpy 1.13 (pydata#2455)
  Add CFTimeIndex.shift (pydata#2431)
  Fix FutureWarning in CFTimeIndex.date_type (pydata#2448)
  fix:2445 (pydata#2446)
  Enable use of cftime.datetime coordinates with differentiate and interp (pydata#2434)
  restore ddof support in std (pydata#2447)
  Future warning for default reduction dimension of groupby (pydata#2366)
  Remove incorrect statement about "drop" in the text docs (pydata#2439)
  Use profile mechanism, not no-op mutation (pydata#2442)
  ...
@pep8speaks
Copy link

pep8speaks commented Oct 10, 2018

Hello @maahn! Thanks for updating the PR.

Line 1062:13: W504 line break after binary operator
Line 1063:14: W504 line break after binary operator
Line 1072:13: W504 line break after binary operator

Comment last updated on October 23, 2018 at 06:44 Hours UTC

@dcherian
Copy link
Contributor

Really sorry for the delay, @maahn.

I've merged master, refactored out the utility functions to utils.py, fixed the tests locally and added a whats-new entry. I'll merge once this round of tests pass.

@maahn
Copy link
Author

maahn commented Oct 10, 2018

great, thanks for taking care of that!

@dcherian dcherian merged commit 5ebed79 into pydata:master Oct 23, 2018
@dcherian
Copy link
Contributor

Failed test is cfgrib test.

Thanks @maahn

@maahn maahn deleted the groupy_plot2 branch October 23, 2018 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants