Discussion about math, puzzles, games and fun. Useful symbols: ÷ × ½ √ ∞ ≠ ≤ ≥ ≈ ⇒ ± ∈ Δ θ ∴ ∑ ∫ • π ƒ -¹ ² ³ °
You are not logged in.
Post a reply
Topic review (newest first)
That's the one you used.
It is possible to do some algebra on that to get an alternative version that gives the same result
This formula is handy if you are having to calculate the variance on paper as you don't have to work out the mean before you start doing the sum of squares total, so it saves time. It is also used by calculators that have statistical functions as it only requires three memories to do the calculations: one for the running count, n; one for the running total of x; and one for the running total of x squared. When you have entered all the data (which the calculator doesn't have to remember) there are built in functions to calculate the mean and variance.
Now if you have a set of data and just want to get the variance, use either of the above.
But what happens if you just take a sample of values, calculate the sample mean and variance as usual; but now want to estimate the 'population' statistics?
eg. You have sampled the weight of bags of sugar coming off a production line, and now want to say what the mean and variance are for the whole production. Can you use the sample statistics?
The answer is YES and NO.
What do I mean by that? Well, imagine you keep taking samples and computing the mean and variance for each sample. The sample means will be symmetrically clustered around the population mean (this can be proved by something called 'expectation algebra' but it is a complicated proof so I'd rather not go into it.) So taking any particular sample mean won't give you the true population mean but it is said to be an unbiased estimator for it. By which I mean, there's no bias in taking one value; it may be too high; it may be too low; but these are equally likely. So you may use the sample mean as an estimate for the population mean.
However, the same is not true for the sample variance. If you repeat the sampling many times and compute the variance each time, you again get a set of results that are symmetrically clustered about a fixed value; but that value is not the population variance. The value you get will tend to be too low. I like to think of it like this: you've only taken a few samples from the population so there's less of a spread in the results than if you took the whole population. This leads to a bias if you take the sample variance as the population variance. But the bias is by a predictable amount!
Expectation algebra shows that the mean of the sample variances (let's call it s^2) is given by this formula:
As you can see this leads to a variance that is too low, but only by a tiny amount when n is very large. So if you take a big sample you could use the sample variance for the population variance and it probably wouldn't matter; but if n is small, it would because you'd be using a variance that is too small.
But you can easily unbias it. If you multiply the sample variance by
you unbias it by just the right factor. So you could calculate the sample variance, and then adjust it by this multiplier. But, as the last step in calculating a variance is to divide by n, you can save some steps.
This formula is often called (incorrectly) the sample variance formula. It isn't. It is the formula for estimating the population variance from a set of sample data. Calculators and math packages will probably have it labelled as s^2 but, hopefully, you can see this is not quite correct.
So, back to your original post. You said
Now it is debatable that what you meant was " he wants to use this sample to calculate the standard deviation for all of his earnings" in which case you would divide the sum of the squared deviations from the mean by 6. But it isn't what you said. In any case, the estimator formula is only valid if you take a random sample across all his earnings. Taking values from just one week isn't random because sales may have been poor at that time of the year leading to poor earnings. Or maybe this was an early week in his employment when he was keen and hard working, before he became cynical and disgruntled and ended up getting the sack. There is lots of potential for introducing bias if you just take 7 days, one after the other.
OK, but it will have to be later. I'm part way through setting up an arch in my garden and only came in for a coffee break. I'll have a go this evening for you. (My time now is about 11am, BST.)
I've been waiting for a response from you.
hi julianthemath and bobbym
Maybe he should have said ".......of his earnings for one week", but, from what bobbym has told us, he didn't work there any longer than that because he got fired!
Do you realise that if one of those daily numbers was a misprint then its error is exaggerated by taking its square.
Just go ahead. Use Mathematica to round off the standard deviation.
If you want I can do the calculation by hand.
Oh. Mathematica again?
I used a program to get it and rounded it.