Open
Description
Triggered by the regression reported in #26206 and my attempt to fix it (#26848), I looked a bit into how our different constructors handle different cases of invalid datetimes.
Out of bound datetimes (cases: a non-ns unit numpy datetime64 array, datetime.datetime objects, strings):
M8[D] out of bound | datetime.datetime out of bound | list of M8[D] scalars out of bound | string out of bound | |
---|---|---|---|---|
dateframe | datetime64[ns] 1842-11-22 0 ... | object | object | object |
datetimeindex | OutOfBoundsDatetime | OutOfBoundsDatetime | OutOfBoundsDatetime | OutOfBoundsDatetime |
index | OutOfBoundsDatetime | object | object | object |
index (explicit dtype) | OutOfBoundsDatetime | OutOfBoundsDatetime | OutOfBoundsDatetime | OutOfBoundsDatetime |
pd.to_datetime | OutOfBoundsDatetime | OutOfBoundsDatetime | OutOfBoundsDatetime | OutOfBoundsDatetime |
series | datetime64[ns] 1842-11-22 0 ... | object | object | object |
series (explicit dtype) | datetime64[ns] 1842-11-22 0 ... | datetime64[ns] 1842-11-22 0 ... | datetime64[ns] 1842-11-22 0 ... | datetime64[ns] 1842-11-22 0 ... |
Some remarks here:
- The above table is for master (and the same for 0.24). However, on pandas 0.22, the case of out of bounds datetime64 array raised an error in all cases (so also for Series(..), that's REGR? no error anymore when converting out of bounds datetime64[non-ns] data #26206).
- For
to_datetime
, all the OutOfBounds are expected, as the default is to error. - For the ones with a specified dtype (either explicit dtype passed or DatetimeIndex), we also expect an error. So DatetimeIndex and Index(.., dtype='M8[ns]') are fine, but
Series(.., dtype='M8[ns])
is clearly broken (gives an incorrect date for all cases) - The inconsistencies are mainly when no dtype is enforced. In some cases (eg for datetime.datetime objects) we return a object dtype Index/Series, in other cases we raise an error.
The last item is the main question: when inferring (no dtype enforced) and in case the data are clearly timestamps (either numpy datetime64 or datetime.datetime), should we raise an error or return object dtype.
Note that for datetime.datetime, returning object dtype might be more logical as it means returning the input as is. While for datetime64[non-ns], it actually means converting a numpy dtype to object data.