Skip to content

BUG: make SeriesGroupBy nlargest and nsmallest behave like other filtrations #53707

Open
@mvashishtha

Description

@mvashishtha

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

SeriesGroupBy nlargest and nsmallest filter the data and seem to match the description of filtration in the groupby documentation. However, unlike the filtration methods, they always put the group keys in the result, and they follow as_index to decide whether to put the key in the row index or in the columns.

Feature Description

import pandas as pd

 df = pd.DataFrame([['a', 1]], index=['i1'])

# currently this gives a series with a multiindex with (a, 'i1'):
"""
a  i1    1
Name: 1, dtype: int64
"""
# instead it would give just a series with just the original index value 'i1':
"""
i1    1
Name: 1, dtype: int64
"""
df.groupby(0, as_index=True)[1].nlargest(1)

# currently this gives a new RangeIndex with 0 and
# puts the group key as a a column in the dataframe:
"""
   0  1
0  a  1
"""
# instead it would give just a series with just the original index value 'i1':
"""
i1    1
Name: 1, dtype: int64
"""
df.groupby(0, as_index=False)[1].nlargest(1)

Note that this would then match behavior for other filtrations like head()

Alternative Solutions

N/A

Additional Context

In this comment, @TomAugspurger says that nlargest and nsmallest should keep the index because

It can be useful, matches the Series.nlargest behavior, and changing it would be API breaking.

All those reasons apply to SeriesGroupBy.head() and tail(), both of which drop the group keys in the result but filter the data in a very similar way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions