Skip to content

BUG: pd.NA when it replaces a value in a column, changes its type to "object" #44199

Closed
@Demetrio92

Description

@Demetrio92

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

mock_data = pd.DataFrame({
    'date': ['0', '1', '2', '3'],
    'value': [1, 2, 1, 1.5]
})
assert pd.api.types.is_numeric_dtype(mock_data.value)  # passes

mock_data_pd_na.loc[2, 'value'] = pd.NA
assert pd.api.types.is_numeric_dtype(mock_data.value)  # breaks

Issue Description

Changing one value in a column with an NA/NULL should not change column's data type. That seems reasonable. Also, it seems the functionality is already there. I am not entirely sure if this is a bug or a feature.

Essentially it is due to the default assignment of a column type, which is float64 not pd.Float64Dtype(). I am not sure if the migration is on the roadmap, but this bug could be an argument in its favor.

Expected Behavior

mock_data = pd.DataFrame({
    'date': ['0', '1', '2', '3'],
    'value': [1, 2, 1, 1.5]
})
mock_data.value = mock_data.value.astype(pd.Float64Dtype())  # this should happen by default
mock_data.value.dtype  # Float64Dtype()

mock_data.loc[2, 'value'] = pd.NA
assert pd.api.types.is_numeric_dtype(mock_data.value)  # still passes
mock_data.value.dtype  # Float64Dtype()

Installed Versions

pandas           : 1.3.4
INSTALLED VERSIONS
------------------
commit           : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5
python           : 3.9.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.11.0-34-generic
Version          : #36~20.04.1-Ubuntu SMP Fri Aug 27 08:06:32 UTC 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8
pandas           : 1.3.4
numpy            : 1.21.2
pytz             : 2021.3
dateutil         : 2.8.2
pip              : 21.3.1
setuptools       : 58.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.0.2
IPython          : 7.28.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.4.3
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 5.0.0
pyxlsb           : None
s3fs             : None
scipy            : 1.7.1
sqlalchemy       : 1.4.25
tables           : None
tabulate         : 0.8.9
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsIndexingRelated to indexing on series/frames, not to indexes themselvesMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNA - MaskedArraysRelated to pd.NA and nullable extension arrays

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions