Skip to content

ENH: Allow tz-aware origin parameter in pandas.to_datetime #37482

Closed
@venaturum

Description

@venaturum

Is your feature request related to a problem?

I wish I could use pandas to map back and forth between tz-aware timestamps and floats.

This has been raised before
#16842

however it was argued that the incumbent behaviour did not need altering, but I'd like to plead my case below.

Describe the solution you'd like

Currently if origin parameter in pd.to_datetime is tz-aware an exception is produced;
ValueError: origin offset ..... must be tz-naive

I believe if a tz-aware origin is used then the result of pd.to_datetime should be a tz-aware timestamp.

API breaking implications

I believe an implementation is possible which does not break the api, where the timezone is extracted from the origin parameter (and it could be None) and used to localize other timestamps in the code where necessary.

Describe alternatives you've considered

I have considered writing my own to_datetime to accept timezone aware origins, or perhaps using tz_convert as a mediator between the mapping, however I would still argue the default behaviour of to_datetime is not ideal.

Additional context

Consider the following code, and for context in the Australia/Sydney timezone clocks were wound forward an hour at 2am on the 4th of October 2020.

import pandas as pd

tz = pytz.timezone('Australia/Sydney')
origin = pd.Timestamp('2020-10-04', tz=tz)
test_date_1 = pd.Timestamp('2020-10-04 1:00', tz=tz)
test_date_2 = pd.Timestamp('2020-10-04 3:00', tz=tz)

print(f'There is {(test_date_1 - origin).total_seconds()/3600} hours from origin to {test_date_1}')
print(f'There is {(test_date_2 - origin).total_seconds()/3600} hours from origin to {test_date_2}')

The code will correctly calculate the time delta, with an origin set to the start of the day:
There is 1.0 hours from origin to 2020-10-04 01:00:00+10:00
There is 2.0 hours from origin to 2020-10-04 03:00:00+11:00

How can I map the number 2 back to test_date_2?

This code

pd.to_datetime(2, unit='h', origin=origin)

produces
ValueError: origin offset 2020-10-04 00:00:00+10:00 must be tz-naive

This code

pd.to_datetime(2, unit='h', origin=origin.tz_localize(None))

produces a tz-naive Timestamp: Timestamp('2020-10-04 02:00:00')

Trying to localize it of course does not work, and raises an exception

pd.to_datetime(2, unit='h', origin=origin.tz_localize(None)).tz_localize(tz)

NonExistentTimeError: 2020-10-04 02:00:00

The answer to "What time is 2hrs past midnight on the 4th of October 2020 in Sydney. Australia" is not ambiguous. It has an answer and I believe pandas should be able to accomodate this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions