-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
ENH: Plotting for groupby_bins #2152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
DataArrays created with e.g. groupy_bins have coords containing of pd._libs.interval.Interval. For plotting, the pd._libs.interval.Interval is replaced with the interval's center point. '_center' is appended to teh label
This seems like a good idea.
I wonder if it's worth just making this behavior the default, or generating an additional coordinate for the bin centers automatically. |
Thanks, for the comment, but I wouldn't use center labels when using |
xarray/plot/plot.py
Outdated
with the Intervals' mid points.In addition, _center is | ||
appended to the label | ||
""" | ||
if _valid_other_type(array, [pd._libs.interval.Interval]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use pd.Interval
directly, which is how this is exposed as public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed.
xarray/plot/plot.py
Outdated
@@ -267,6 +280,8 @@ def line(darray, *args, **kwargs): | |||
|
|||
_ensure_plottable(xplt) | |||
|
|||
xplt.values, xlabel = _interval_to_mid_points(xplt.values, xlabel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than mutating xplt.values
, please assign a new variable. Otherwise this could mutate an inline argument to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, changed it here and in lines 628f. Note that this changes the type of xplt passed to ax.plot from DataArray to np.array, but I think this shouldn't matter.
…original variable. Note that this changes the the type of xplt from DataArray to np.array in the line function.
xarray/tests/test_plot.py
Outdated
'mean', 'prod', 'sum', | ||
'std', 'var', 'median']: | ||
gp = self.darray.groupby_bins(dim, [-1, 0, 1, 2]) | ||
getattr(gp, method)().plot.hist(range=(-1,2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E231 missing whitespace after ','
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few comments on your tests -- I think they can be simplified
xarray/tests/test_plot.py
Outdated
for dim in self.darray.dims: | ||
for method in ['argmax', 'argmin', 'max', 'min', | ||
'mean', 'prod', 'sum', | ||
'std', 'var', 'median']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to test all these different methods here. They all use the same logic internally, so just one groupby method should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will use mean only.
xarray/tests/test_plot.py
Outdated
@@ -297,6 +297,19 @@ def test_convenient_facetgrid_4d(self): | |||
with raises_regex(ValueError, '[Ff]acet'): | |||
d.plot(x='x', y='y', col='columns', ax=plt.gca()) | |||
|
|||
def test_coord_with_interval(self): | |||
for dim in self.darray.dims: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the point of testing multiple dimensions is -- do you expect different behavior for different dimensions? If not, I would probably just pick one dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, that was basically a copy paste error from the 2d version. Will change that.
xarray/plot/plot.py
Outdated
primitive = ax.plot(xplt, yplt, *args, **kwargs) | ||
# Remove pd.Intervals if contained in xplt.values. | ||
if _valid_other_type(xplt.values, [pd.Interval]): | ||
xplt_val = _interval_to_mid_points(xplt.values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to plot labels like [0, 10)
instead of 5
. But this is certainly an improvement over the current state of things, so I would be happy to potentially revise this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, I guess in many case there is not enough space for all tick labels. And labeling only some intervals might be confusing? Maybe something like a step plot would be an alternative? https://matplotlib.org/gallery/lines_bars_and_markers/step_demo.html
But is that always desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is probably better default behavior. Potentially there could be a flag to choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also remove the loops over dims / groupby methods in the other tests?
(Plotting tests are kind of slow due to matplotlib, so we try to be a little more careful than usual to only add those that are necessary.)
@@ -975,13 +972,9 @@ def test_cmap_and_color_both(self): | |||
|
|||
def test_2d_coord_with_interval(self): | |||
for dim in self.darray.dims: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the loop here, because for the 2d plots, x and y axis are treated separately.
New bool keyword `interval_step_plot` to turn it off.
Ok, I added the interval_step_plot kwarg (default True) to the 1D line plot to make a step plot with the real (i.e. not interpolated) boundaries. After that I had the feeling that it's inconsistent if line uses the real boundaries but pcolormesh doesn't. So I also patched that, but I had to disable infer_intervals for these cases. See also: Let me know what you think and I will also add some documentation. |
Rather than using I'm not sure that a step plot would be preferred by most users for a plot over an axis on which groupby_bins was applied. Personally, I would probably prefer a line + scatter plot (e.g., |
xarray/plot/plot.py
Outdated
xplt_val, yplt_val = _interval_to_double_bound_points(xplt.values, | ||
yplt.values) | ||
# just to be sure that matplotlib is not confused | ||
kwargs['linestyle'] = kwargs['linestyle'].replace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is quite ugly, but does it make sense to import re only for this one line?
Good idea, it turned out the step function was quite easy to implement by using https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py#L1735 See https://gist.github.com/maahn/91da0a8d299ef6567827749cbe2f1913 |
xarray/plot/plot.py
Outdated
with the Intervals' mid points. | ||
""" | ||
|
||
return np.asarray(list(map(lambda x: x.mid, array))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider writing these with list comprehensions, e.g., np.array([x.mid for x in array])
xarray/plot/plot.py
Outdated
xarray1 = np.asarray(list(map(lambda x: x.left, xarray))) | ||
xarray2 = np.asarray(list(map(lambda x: x.right, xarray))) | ||
|
||
xarray = ([x for x in itertools.chain.from_iterable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be list(itertools.chain.from_iterable(zip(xarray1, xarray2)))
xarray/plot/plot.py
Outdated
ylab_extra = '_center' | ||
else: | ||
yplt = yval | ||
ylab_extra = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this logic in a helper function? This function is already getting pretty long :)
Thanks for the review. |
doc/plotting.rst
Outdated
Step plots | ||
~~~~~~~~~~ | ||
|
||
As an alternative, also a step lot similar to matplotlib's ``plt.step`` can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"step lot" should be step plot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, thanks!
Sorry, for the delay, I finally merged upstream. Looks like the failed builds are unrelated to my changes, so it should be ready for merging? |
* master: (51 commits) xarray.backends refactor (pydata#2261) Fix indexing error for data loaded with open_rasterio (pydata#2456) Properly support user-provided norm. (pydata#2443) pep8speaks (pydata#2462) isort (pydata#2469) tests shoudn't need to pass for a PR (pydata#2471) Replace the last of unittest with pytest (pydata#2467) Add python_requires to setup.py (pydata#2465) Update whats-new.rst (pydata#2466) Clean up _parse_array_of_cftime_strings (pydata#2464) plot.contour: Don't make cmap if colors is a single color. (pydata#2453) np.AxisError was added in numpy 1.13 (pydata#2455) Add CFTimeIndex.shift (pydata#2431) Fix FutureWarning in CFTimeIndex.date_type (pydata#2448) fix:2445 (pydata#2446) Enable use of cftime.datetime coordinates with differentiate and interp (pydata#2434) restore ddof support in std (pydata#2447) Future warning for default reduction dimension of groupby (pydata#2366) Remove incorrect statement about "drop" in the text docs (pydata#2439) Use profile mechanism, not no-op mutation (pydata#2442) ...
Hello @maahn! Thanks for updating the PR.
Comment last updated on October 23, 2018 at 06:44 Hours UTC |
Really sorry for the delay, @maahn. I've merged master, refactored out the utility functions to |
great, thanks for taking care of that! |
Failed test is cfgrib test. Thanks @maahn |
DataArrays created with e.g. groupy_bins have coords arrays consisting of pd._libs.interval.Interval. Therefore, they cannot be plotted. The small patch replaces the the pd._libs.interval.Interval values with the interval's center point and adds
_center
to the label name. It looks like this: https://gist.github.com/maahn/91da0a8d299ef6567827749cbe2f1913I don't think there is any need for additional documentation except the
whats-new.rst
or tests(?), but I'm also happy to add them if you think it is required.whats-new.rst
for all changes andapi.rst
for new API