Closed
Description
The problem
If we create a Series
with a defined dtype
and then a new row is added into that Series
the dtype
changes. I have left an example below:
- https://pandas.pydata.org/pandas-docs/stable/missing_data.html#inserting-missing-data
- https://pandas.pydata.org/pandas-docs/stable/indexing.html#setting-with-enlargement
Example
import pandas as pd
import numpy as np
pd.__version__ # '0.23.1'
# with Series
s = pd.Series([1,2], dtype=np.float64)
print(s.dtype) # -> float64
s[3] = None
print(s.dtype) # -> object
# with DataFrames
d = pd.DataFrame([1,2], dtype=np.float64)
print(d.dtypes) # -> float64
d.loc[3, 0] = None
print(d.dtypes) # -> float64
However, this doesn't happen when the row is already present:
In[12]: s = pd.Series([1,2]).astype(np.float64)
In[13]: s[3] = None
In[14]: s
Out[14]:
0 1
1 2
3 None
dtype: object
In[15]: s = s.astype(np.float64)
In[16]: s
Out[16]:
0 1.0
1 2.0
3 NaN
dtype: float64
# row 3 (position 2) is already present in s
In[18]: s.iloc[2] = None
In[19]: s
Out[19]:
0 1.0
1 2.0
3 NaN
dtype: float64
In[20]: s.loc[3] = None
In[21]: s
Out[21]:
0 1.0
1 2.0
3 NaN
dtype: float64