Skip to content

SUMM: Consistent Casting and Length Checks in Ops #37086

Open
@jbrockmendel

Description

@jbrockmendel

xref #27911 API: when to cast list-like to ndarray, check len match
xref #13637 API: Index/Series/DataFrame op 1-d list-like coercion

Related bugs:
#21517 Comparison of MultiIndex to tuple interprets tuple as collection

One way to ensure consistent behavior is to enshrine it in ops.unpack_zerodim_and_defer, which just about all of our ops now use. The following is an implementation that matches our current behavior (more specifically: passes current tests) when tacked on to unpack_zerodim_and_defer, with one exception, described below:

        if is_list_like(other) and len(other) != len(self):
            if isinstance(self, ABCDataFrame):
                # DataFrame method does this in align_method_FRAME
                pass
            elif isinstance(self, ABCSeries) and isinstance(other, ABCSeries):
                # We have _align_foo methods that will handle this
                pass                
            elif self.dtype.kind == "O" and not hasattr(other, "dtype"):
                # With e.g. list or tuple and object dtype, it is ambiguous
                #  whether this is intended to be treated as a scalar.
                pass
            else:
                raise ValueError("Lengths must match")

The sticking point is what we do when we have a listlike that is not array-like, generally list or tuple. In this implementation, we would treat it as arraylike whenever we are not object-dtype, and keep it as-is otherwise.

The test that fails with this has a length-1 list:

s_0123 = Series(range(4), dtype="int64")
expected = Series([False] * 4)

result = s_0123 & [False]

ATM in array_ops.logical_op we cast list (only list, not tuple) to ndarray[object]. The implementation above would make this raise ValueError.

Outstanding questions:

  1. Allow length-1 others to broadcast like numpy?
  2. Allow length-1 self to broadcast like numpy?
  3. DataFrame always casts listlike to arraylike; should that be changed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions