Skip to content

BUG: MultiIndex.set_levels with emtpy levels #16987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,7 @@ Other API Changes
- Compression defaults in HDF stores now follow pytable standards. Default is no compression and if ``complib`` is missing and ``complevel`` > 0 ``zlib`` is used (:issue:`15943`)
- ``Index.get_indexer_non_unique()`` now returns a ndarray indexer rather than an ``Index``; this is consistent with ``Index.get_indexer()`` (:issue:`16819`)
- Removed the ``@slow`` decorator from ``pandas.util.testing``, which caused issues for some downstream packages' test suites. Use ``@pytest.mark.slow`` instead, which achieves the same thing (:issue:`16850`)
- func:`pandas.MultiIndex.set_levels()` now allows the setting of empty levels and, when `level` is a list-like object, each element of levels is confirmed to be list-like (:issue:`16147`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put this under bug fixes. And I think it'll be

:meth:`pandas.MultiIndex.set_levels` (but `:func:` may do the same thing)

Finally, I think you can remove the "when level is a list-like object, each element of levels is confirmed to be list-like" section. That's more of an internal thing I think, that we're checking all the items in levels instead of just the first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks. I'll sort this stuff out when I pass some of the other tests. I haven't accounted for all cases. Looks like it is indeed linked with #14823. I tried merging my code with that PR but still fail some tests, notably when levels consists of the Index class. I won't have much time over the next 2 weeks but we shall see.

- Moved definition of ``MergeError`` to the ``pandas.errors`` module.
- The signature of :func:`Series.set_axis` and :func:`DataFrame.set_axis` has been changed from ``set_axis(axis, labels)`` to ``set_axis(labels, axis=0)``, for consistency with the rest of the API. The old signature is deprecated and will show a ``FutureWarning`` (:issue:`14636`)
- :func:`Series.argmin` and :func:`Series.argmax` will now raise a ``TypeError`` when used with ``object`` dtypes, instead of a ``ValueError`` (:issue:`13595`)
Expand Down
28 changes: 20 additions & 8 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,16 +229,28 @@ def set_levels(self, levels, level=None, inplace=False,
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
"""
if level is not None and not is_list_like(level):
if not is_list_like(levels):
raise TypeError("Levels must be list-like")
if is_list_like(levels[0]):
raise TypeError("Levels must be list-like")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this validation is quite similar to .from_arrays. and should coordinate with changes in #14823 (cc @groutr)

level_list_like = level is None or is_list_like(level)
levels_list_like = (is_list_like(levels) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

further some of this validation is duplicative of _set_levels. So rather add any additional validation there directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a try at this. Still some work to do on it

all(is_list_like(x) for x in levels) and
levels != [] )

if level_list_like:
# level is a list-like object of scalars
levels_error = ("`levels` and `level` are incompatible. "
"When `level` is list-like, `levels` must be a "
"list-like object containing list-like objects")
if not levels_list_like:
raise TypeError(levels_error)
elif not level_list_like:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be else I think?

# level is a scalar
levels_error = ("`levels` and `level` are incompatible. "
"When `level` is a scalar, `levels` must be a "
" list-like object of scalars.")
if levels_list_like:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will behave incorrectly for setting index.set_index(levels=[], level=0). That should work, just setting that level's levels to []. In other words, this test should pass

def test_thing(self):
    index = pd.MultiIndex(levels=[['a', 'b'], [1, 2]], labels=[[], []])
    result = index.set_levels([], level=0)
    expected = pd.MultiIndex(levels=[[], [1, 2]], labels=[[], []])
    tm.assert_index_equal(result, expected)

lmk if that makes sense. I may be incorrect.

raise TypeError(levels_error)
level = [level]
levels = [levels]
elif level is None or is_list_like(level):
if not is_list_like(levels) or not is_list_like(levels[0]):
raise TypeError("Levels must be list of lists-like")

if inplace:
idx = self
Expand Down
44 changes: 44 additions & 0 deletions pandas/tests/indexing/test_multiindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,50 @@ def test_xs_multiindex(self):
expected.columns = expected.columns.droplevel('lvl1')
tm.assert_frame_equal(result, expected)

def test_set_level_checkall(self):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference issue number here.

idx = MultiIndex.from_tuples([(1, u'one'), (1, u'two'),
(2, u'one'), (2, u'two')],
names=['foo', 'bar'])
result = idx.set_levels([['a', 'b'], [1, 2]])
expected = MultiIndex(levels=[[u'a', u'b'], [1, 2]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
tm.assert_index_equal(result, expected)

result = idx.set_levels(['a', 'b'], level=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have these tests passing, it may be good to split these up into multiple tests so it's easier to review.

expected = MultiIndex(levels=[[u'a', u'b'], [u'one', u'two']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
tm.assert_index_equal(result, expected)

result = idx.set_levels(['a', 'b'], level='bar')
expected = MultiIndex(levels=[[1, 2], [u'a', u'b']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
tm.assert_index_equal(result, expected)

result = idx.set_levels([['a', 'b'], [1, 2]], level=[0, 1])
expected = MultiIndex(levels=[[u'a', u'b'], [1, 2]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
tm.assert_index_equal(result, expected)

# setting empty levels are allowed
idx = MultiIndex(levels=[['L1'], ['L2']], labels=[[], []],
names=['a', 'b'])
result = idx.set_levels([], level='a')
expected = MultiIndex(levels=[[], ['L2']], labels=[[], []],
names=['a', 'b'])
tm.assert_index_equal(result, expected)

idx = MultiIndex(levels=[['L1'], ['L2']], labels=[[], []],
names=['a', 'b'])
result = idx.set_levels([[], []], level=['a', 'b'])
expected = MultiIndex(levels=[[], []], labels=[[], []],
names=['a', 'b'])
tm.assert_index_equal(result, expected)

def test_multiindex_setitem(self):

# GH 3738
Expand Down