Math Is Fun Forum / Standard deviation

Re: Standard deviation

2013-04-24T19:08:57Z

Variance formulas.

Sorry, this has turned out to be quite a long post. Hope you manage to stay awake to the end.

The definition formula is

That's the one you used.

It is possible to do some algebra on that to get an alternative version that gives the same result

This formula is handy if you are having to calculate the variance on paper as you don't have to work out the mean before you start doing the sum of squares total, so it saves time. It is also used by calculators that have statistical functions as it only requires three memories to do the calculations: one for the running count, n; one for the running total of x; and one for the running total of x squared. When you have entered all the data (which the calculator doesn't have to remember) there are built in functions to calculate the mean and variance.

Now if you have a set of data and just want to get the variance, use either of the above.

But what happens if you just take a sample of values, calculate the sample mean and variance as usual; but now want to estimate the 'population' statistics?

eg. You have sampled the weight of bags of sugar coming off a production line, and now want to say what the mean and variance are for the whole production. Can you use the sample statistics?

The answer is YES and NO.

What do I mean by that? Well, imagine you keep taking samples and computing the mean and variance for each sample. The sample means will be symmetrically clustered around the population mean (this can be proved by something called 'expectation algebra' but it is a complicated proof so I'd rather not go into it.) So taking any particular sample mean won't give you the true population mean but it is said to be an unbiased estimator for it. By which I mean, there's no bias in taking one value; it may be too high; it may be too low; but these are equally likely. So you may use the sample mean as an estimate for the population mean.

However, the same is not true for the sample variance. If you repeat the sampling many times and compute the variance each time, you again get a set of results that are symmetrically clustered about a fixed value; but that value is not the population variance. The value you get will tend to be too low. I like to think of it like this: you've only taken a few samples from the population so there's less of a spread in the results than if you took the whole population. This leads to a bias if you take the sample variance as the population variance. But the bias is by a predictable amount!

Expectation algebra shows that the mean of the sample variances (let's call it s^2) is given by this formula:

As you can see this leads to a variance that is too low, but only by a tiny amount when n is very large. So if you take a big sample you could use the sample variance for the population variance and it probably wouldn't matter; but if n is small, it would because you'd be using a variance that is too small.

But you can easily unbias it. If you multiply the sample variance by

you unbias it by just the right factor. So you could calculate the sample variance, and then adjust it by this multiplier. But, as the last step in calculating a variance is to divide by n, you can save some steps.

This formula is often called (incorrectly) the sample variance formula. It isn't. It is the formula for estimating the population variance from a set of sample data. Calculators and math packages will probably have it labelled as s^2 but, hopefully, you can see this is not quite correct.

So, back to your original post. You said

Bobbym works at Pizza Hut. He wants to calculate the standard deviation of his weekly earnings.
Here is how much Bobbym earned this week :

Now it is debatable that what you meant was " he wants to use this sample to calculate the standard deviation for all of his earnings" in which case you would divide the sum of the squared deviations from the mean by 6. But it isn't what you said. In any case, the estimator formula is only valid if you take a random sample across all his earnings. Taking values from just one week isn't random because sales may have been poor at that time of the year leading to poor earnings. Or maybe this was an early week in his employment when he was keen and hard working, before he became cynical and disgruntled and ended up getting the sack. There is lots of potential for introducing bias if you just take 7 days, one after the other.

My Conclusion: you were right to divide by 7.

But, recommendation: Don't round off early in the calculation; maintain all the figures until the end and then round off. You were lucky to get 74 after all that rounding and one calculation error.

Bob

Re: Standard deviation

2013-04-24T09:58:37Z

OK, but it will have to be later. I'm part way through setting up an arch in my garden and only came in for a coffee break. I'll have a go this evening for you. (My time now is about 11am, BST.)

Bob

Re: Standard deviation

2013-04-24T09:56:36Z

Yup.

bob bundy wrote:

I've been waiting for a response from you.
Wolfram says this is an area that is often confused. I'm not confused. So I'm happy to have a go at explainig this if you want.
Just reply, "Yes,please".
Bob
jtm wrote:
bobbym wrote:
Okay, we will divide by 7.

Re: Standard deviation

2013-04-24T09:47:29Z

hi julianthemath

jtm wrote:

I've been waiting for a response from you.

Wolfram says this is an area that is often confused. I'm not confused. So I'm happy to have a go at explainig this if you want.

Just reply, "Yes,please".

Bob

Re: Standard deviation

2013-04-24T08:37:34Z

Re: Standard deviation

2013-04-22T19:55:55Z

Hi Bob;

Okay, we will divide by 7.

Re: Standard deviation

2013-04-22T18:37:13Z

hi bobbym,

Arhh, I see what you mean (pun not intended). He makes lots of approximations and adds in 169 twice. He was lucky to get 74 at the end.

But that doesn't change my opinion on what to divide by.

http://www.mathsisfun.com/data/standard-deviation.html

we are encouraged to think there are two formulas for variance.

http://www.mathsisfun.com/data/standard … mulas.html

the reason for this is explained more fully.

Re: Standard deviation

2013-04-22T16:35:24Z

Hi All;

Yes, we could discuss what sd is appropriate until they rehire me but what I was after was this:

9025+5929+9+11236+12321+169+25 = 38714

14954+11245+12490+194 = 38883

The little fellow seems to have found a way to pair off 7 numbers into 4 distinct pairs? Now to mention that in post #1276 seems picayune but in post #2 it was okay.

Re: Standard deviation

2013-04-22T12:45:09Z

hi julianthemath and bobbym

See below for my calculations. There are 7 values so once the sum of the squared deviations has been determined this should be divided by 7.

So I agree with julianthemath.

The value of 80 is sum/6 which is used to determine an unbiassed estimate of a population sd for a sample size n.

julianthemath wrote:

He wants to calculate the standard deviation of his weekly earnings.

Maybe he should have said ".......of his earnings for one week", but, from what bobbym has told us, he didn't work there any longer than that because he got fired!

So, in this case, we know the whole population, => dividing by 7 is appropriate.

http://www.mathsisfun.com/data/standard … mulas.html

Bob

Re: Standard deviation

2013-04-22T10:55:34Z

Do you realise that if one of those daily numbers was a misprint then its error is exaggerated by taking its square.
Mean Absolute deviation is better

Indeed in experimental science (all data!) mean of the cube root of the deviation gives a more reliable answer as it gives more weight to the numbers whose deviation is smallest (more carefully measured?)

Re: Standard deviation

2013-04-22T10:47:33Z

Hi;

Without the senseless rounding I am getting

80.32582458486246

Re: Standard deviation

2013-04-22T04:39:42Z

Just go ahead. Use Mathematica to round off the standard deviation.

Re: Standard deviation

2013-04-22T01:35:36Z

If you want I can do the calculation by hand.

Re: Standard deviation

2013-04-22T01:20:15Z

Oh. Mathematica again?

Re: Standard deviation

2013-04-22T01:11:24Z

I used a program to get it and rounded it.