Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
SeriesGroupBy nlargest and nsmallest filter the data and seem to match the description of filtration in the groupby documentation. However, unlike the filtration methods, they always put the group keys in the result, and they follow as_index
to decide whether to put the key in the row index or in the columns.
Feature Description
import pandas as pd
df = pd.DataFrame([['a', 1]], index=['i1'])
# currently this gives a series with a multiindex with (a, 'i1'):
"""
a i1 1
Name: 1, dtype: int64
"""
# instead it would give just a series with just the original index value 'i1':
"""
i1 1
Name: 1, dtype: int64
"""
df.groupby(0, as_index=True)[1].nlargest(1)
# currently this gives a new RangeIndex with 0 and
# puts the group key as a a column in the dataframe:
"""
0 1
0 a 1
"""
# instead it would give just a series with just the original index value 'i1':
"""
i1 1
Name: 1, dtype: int64
"""
df.groupby(0, as_index=False)[1].nlargest(1)
Note that this would then match behavior for other filtrations like head()
Alternative Solutions
N/A
Additional Context
In this comment, @TomAugspurger says that nlargest
and nsmallest
should keep the index because
It can be useful, matches the Series.nlargest behavior, and changing it would be API breaking.
All those reasons apply to SeriesGroupBy.head()
and tail()
, both of which drop the group keys in the result but filter the data in a very similar way.