# Why do we use n-1 for sample denominator?

## Recommended Posts

Hi.

So I understand the concept of degrees of freedom, as in we are not free to choose x many variables. Like if there are three numbers and the mean is 20, the degree of freedom is 2, because if we choose 2 numbers, the third one is defined.

I don't get, however, why sample's of a population use n-1 in the denominator, when populations only use 1.

Lets say this is a population: 1,3,5,2,7,8,6,9,2,7. The mean would be 50/10=5. The variance is 7.2.

Let's take a sample of this population. 5+7+6+8+7.

Let's take a sample of this population. 5+7+6+8+7. The mean would be 33/5=6.6. The variance, using N IN THE DENOMINATOR (which I know is incorrect), would be 1.04. With N-1 as the denominator, the answer is 1.3. I fail to see how 1.3 is any more accurate of the population than 1.04, since neither is anywhere close to 7.2. I guess because it is ever so slightly closer to 7.2?

I don't really get it conceptually.

Also, if the sample was 1,3,7,8,9. The mean would be 5.6. The variance using N would be 9.44. The variance using N-1 would be 11.8. So in this case, N is actually closer to the real deal of 7.2, than n-1 would be

##### Share on other sites

When using the standard formula for covariance with a sample the statistic is in fact biased due intuitively to the fact that the covariance depends upon the mean, which in turn depends on the sample. To correct this, and form an unbiased statistic we must multiple the standard formula for covariance by $\frac{N}{N-1}$. This is called Bessel's Correction.

##### Share on other sites

To add to what DJBruce said, as I understand it, the degrees of freedom is just that: the number of entries in the residuals vector that are 'allowed' to vary:

since the residuals, $(x_1 - \overline{x} ... x_n - \overline{x})$, must sum to zero, the entire vector is determined fully by the first N-1 entries.