Skip to content

BUG: to_datetime do not keep the date format throughout the column when using inferred format #34546

Closed
@lviani

Description

@lviani
  • [ X] I have checked that this issue has not already been reported.

  • [X ] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import pandas as pd

# day|month first (no error is raised, and it shouldnt continue normally)
print('First case - year last | no error raised')
df = pd.DataFrame({'timestamp':['11-05-2020','12-05-2020','14-05-2020','13-05-2020','01-06-2020','02-06-2020']})
print(pd.to_datetime(df['timestamp']))

# Year first (an error is raised, thus it is what I would expect)
print('\n\nSecond case - year first | error raised')
df = pd.DataFrame({'timestamp':['2020-11-05','2020-12-05','2020-14-05','2020-13-05','2020-01-06','2020-02-06']})
pd.to_datetime(df['timestamp'])

Problem description

The function to_datetime do not keep the same date format throughout the column when using inferred format. This happens when the dates to parse do NOT start with the year.

Using the table below as input ('test.csv'):
timestamp
2020-11-05
2020-12-05
2020-14-05
2020-13-05
2020-01-06
2020-02-06

Problem:

  1. The problem happens when the dates do not start with the year. In this case, pandas switch the month and day (see row 2 and 3), thus changing the format detected.
    0 2020-11-05
    1 2020-12-05
    2 2020-05-14
    3 2020-05-13
    4 2020-01-06
    5 2020-02-06

  2. When the dates start with the year and error is raised (for me this should also happen in the previous case).
    ValueError: month must be in 1..12

Expected Output

raise an error if the datetime format of the entry values changes in the same column

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-25-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 5.3.2
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions