Skip to content

sum in pandas can concatenate strings #13916

Open
@shoyer

Description

@shoyer

Possibly related: #13912

This looks wrong to me -- probably a bug?

In [36]: pd.Series(['a', 'b', 'c']).sum()
Out[36]: 'abc'

In [37]: pd.__version__
Out[37]: '0.18.1'

This happens on DataFrames, as well:

In [8]: pd.Series(['a', 'b', 'c']).to_frame().sum()
Out[8]:
0    abc
dtype: object

Note, of course, that summing strings:

In [2]: sum(['a', 'b', 'c'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-eba0487fc411> in <module>()
----> 1 sum(['a', 'b', 'c'])

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Interestingly, NumPy has the same (buggy?) behavior on dtype=object arrays:

In [11]: pd.Series(['a', 'b', 'c']).values.sum()
Out[11]: 'abc'

In [12]: pd.Series(['a', 'b', 'c']).values.astype(str).sum()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-4e25bf0067a8> in <module>()
----> 1 pd.Series(['a', 'b', 'c']).values.astype(str).sum()

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/_methods.py in _sum(a, axis, dtype, out, keepdims)
     30
     31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
---> 32     return umr_sum(a, axis, dtype, out, keepdims)
     33
     34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: cannot perform reduce with flexible type

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReduction Operationssum, mean, min, max, etc.StringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions