What kind of variance should `var()` return?

Playing with a toy example today, I was surprised to see the result (I guess I should have read the specification first, LOL). 

Program:

```fortran
use stdlib_experimental_stats, only: mean, var
real :: a(5) = [1, 2, 3, 4 ,5]
print *, var(a), mean((a - mean(a))**2)
end
```

I expected two identical numbers. Result:

```
   2.50000000       2.00000000    
```

Then I looked at the code for var() and saw that we're making an average by dividing the sum by (N - 1), rather than N. 

Then I looked at this issue and the spec, and it's all good: It says that the variance is defined in such way so that we're dividing by (N - 1). The code works as advertised.

But then I wondered why N - 1 and not N, and did some Google searching and found that there are all kinds of variances out there and that this particular flavor is called "population variance", or as described here the best unbiased estimator. Dividing by just N seems to be called just "variance". 

How we define this not only affects the numerical result, but also in some cases the behavior of the program: variance of a scalar is 0, but population variance of a scalar is a NaN. Some discussion [here](https://stats.stackexchange.com/questions/4068/how-should-one-define-the-sample-variance-for-scalar-input). 

NumPy's `np.var()` for example defines it as a simple variance (divide by N), but you can optionally set a "delta degrees of freedom", so that `np.var(x, ddof=1)` corresponds to the population variance.

I'm not a statistician but a wave physicist. I expect my variance to divide by N, and NumPy as served me well so far. 

**Question**: Should we consider adding optional "delta degrees of freedom" to make both statisticians and physicists happy?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What kind of variance should `var()` return? #149

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What kind of variance should var() return? #149

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What kind of variance should `var()` return? #149