Numerically unstable mean calculation for Timedeltas.

I am not sure whether I should report this here or on `numpy`. But this is what lead me to the problem:

```
 In [11]: dAllTags.describe()
Out [11]:
                     finalPeriod
count                      74501
mean    -1 days +02:40:08.792662
std     500 days 06:32:37.640848
min       2 days 00:51:49.730000
25%     498 days 19:11:28.576000
50%     846 days 00:46:56.656000
75%    1245 days 17:11:58.493000
max    2224 days 07:03:26.593000
```

All the values are positive (the minimum is `2 days`) but the `mean` calculated is negative. This happens because the underlying type of `np.timedelta64` is `int64` which overflows while calculating the mean.

Now the issue of numerical stability in `numpy` has had a long history:
- https://github.com/numpy/numpy/issues/4694
- https://github.com/numpy/numpy/issues/1033
- https://github.com/numpy/numpy/issues/1063
- https://github.com/numpy/numpy/issues/2448

And though some steps have been taken to introduce precision accuracy (e.g. by providing `fsum` and using pairwise summation), there doesn't seem to be a consensus for using a numerically stable method for `mean`. 

I was wondering if something could be done on the Pandas level to resolve this issue.

---

Currently, I am working around the issue by using the rather elaborate scheme:

```
df.finalPeriod.view(int).astype(float).mean()
```

since `timedelta64` cannot be directly converted to `float64`. Is there a better/more intuitive way to do this?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Numerically unstable mean calculation for Timedeltas. #9670

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Numerically unstable mean calculation for Timedeltas. #9670

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions