Math Is Fun Forum / Statistics

Re: Statistics

2013-05-10T20:33:21Z

I do not know. You can do either of those. If you want we can continue here.

Of course, before answering his question, I need to know what it is and to understand what it is.

Re: Statistics

2013-05-10T20:28:33Z

Yes that was the one I was talking about. Is it better to post to that one or continue the discussion here?
Also is anyone curious like me about the answer to that one?

There is a mathematical topic called "Linear Statistical Modelling" where I have an old text book from about 1999 approx.
I did do some studies to do with this some time ago, however my knowledge is a bit rusty and the software that goes with
the course I can no longer use because it was for my old computer and I only had one years use of the software allowed by
the license agreement anyway. There is a way that an approximate model is used in the textbook, but as I say it was a long
time ago that I studied this. Also if it were an academic question they would give out some test data with questions to guide
the student through it. (Obviously it would not be like that in real life.)

EDIT/FOOTNOTE:

I have looked at the book ("Statistical Modelling Using Genstat") and there is a piece on converting the multinomial
distribution into a Poisson generalized linear model. It is in chapter 12 called "Loglinear models for contingency tables".
A theorem is used where it is basically trying to say that Poisson models for categorical data behave like multinomial
models. At the time of writing I do not fully understand the proof of the theorem so I am not going to paraphrase the proof.
The hypothesis test would use this model with something like what the software calls:
"Stats|Regression Analysis|Generalized Linear| Log-linear" (using the menu system of Genstat)
Of course the software may have changed by now and other software may now be considered better now for some reason.
The software may have been chosen for educational reasons. However it is advisable to use software for this because otherwise
the calculations by hand would be too complicated. It strikes me that a control group that just has the before and after test
but with no induced (bad) taste stage would be needed to avoid confounding the experiment with the effect of the before
part of the test. However the hypothesis test I think would be the above test.

Re: Statistics

2013-05-10T20:12:20Z

http://www.mathisfunforum.com/viewtopic.php?id=19064?

Re: Statistics

2013-05-10T20:01:08Z

Thankyou anonimnystefy, I have worked it out myself by plugging in the values (57,43,100 etc.) and using an online
integral calculator (because frankly to integrate that one as a pen and paper exercise would be complicated because
apart from having a huge number of terms and a large scope for silly mistakes, it would probably take ages)
reassuringly enough I agree with your answer exactly.

So according to this model the player that lost the contest of 100 games has a 8.16 % (approx.) chance of being the
better of the two players, from a theoretical point of view.

By the way on an entirely different topic (but related to statistics) I remember a post from someone who was doing
some research, and the person wanted help with a hypothesis test for a situation along the following lines:
(I may have changed the problem a little)

You have 100 fish (or n fish perhaps?) in an experiment to do with taste aversion. A before and after trial is performed on each fish.
Before and after the fish is classified according to four categories:

(1) Avoidance of Bait
(2) Approach Bait but does not eat
(3) Small bite/taste of the Bait
(4) Rapid change in swimming speed/motion

These categories give some idea of how each fish responds to some food, and give some idea of the fish in terms
of its reaction to some food, and possibly taste aversion. Whether they are ordinal I am not exactly sure, I think that
ideally they are ordinal, but they could be 4 categories that cannot be put into a precise order. (I need to think about that point.)
Of course at random, there are going to be average proportions for these categories which could be anything between 0 and 1,
and whatever the averages at random there will be a certain degree of variation. We need a hypothesis test ideally that models
the random variation according to a null hypothesis, that gives us some way of deciding whether a statistically significant result
has been obtained at a certain level (eg. p = 0.01 etc. or 99% significance ... ).

I did think that an extension of the binomial distribution might work, except with 4 categories instead of two.
Is this refered to as a multinomial distribution ?

I realise that the post from which I got this happened some time ago (two months approx.) and the person who posted
the problem probably has found out the answer by some method.

Purely out of curiosity, it might be interesting to try to answer this one, however it is a difficult one, and I suspect it probably
requires some proper statistical software to answer this anyway.

Re: Statistics

2013-05-10T18:33:31Z

Hi Steve

The pdf of the distribution is derived from the Bayes' Law for continuous distributions.

Using their formula I got P(p>=1/2|h=57,t=43)=0.0816...

Re: Statistics

2013-05-10T18:28:53Z

bobbym wrote:

Hi;
It is not necessarily true that the victorious player is superior. In the most simplistic sense if we flip a coin 5 times, heads or tails will be victorious with 3 or more wins out of 5. Two players evenly matched ( fair coin ) can produce extremely long runs of such dominance just by luck. Take a random walk for example. The player in the lead will have that lead for more than 90% of the time, even though they are perfectly even. It would be very easy for that player to assume he was much better than the other!

I realise (and did realise) that the victorious player is not necessarily superior.
In fact in my question what I meant was what is the probablility that this "common sense view" is wrong ?

The form of my question was confusing because I used the term "common sense superior" to mean "victorious".

In the long term the more games are played (assuming that the probability of each winning is constant) the better estimate
we get of one player being better than the other modelled by a probability P of one being victorious in a single game.

There may be some use in temporarily assigning the value P = 0.5 for some reason, but the Wikipedia entry I found confusing,
but it does sound quite sophisticated, and it is very unlikely that P is exactly 0.5, I agree with anonimnystefy that if we take
this literally then as the degree of accuracy increases the probability of this converges to zero as the number of decimal places
of the accuracy tends to infinity. In practice we could never have such perfect accuracy, but in that case an interval ought to be
given like: 0.45 to 0.55 or 0.495 to 0.505
However no such range was given. Do we really mean this ? [The range chosen would be an arbitary choice]
In other words would we be likely to want to ask the question in this way ?
Or is it better to ask the question: What is the probability that the non victorious player is better in terms of underlying ability ?

According to the method in the Wikipedia link by anonimnystefy what would the answer be ?

To be quite frank I did not follow the method given by the Wikipedia entry. I could plug in some values and see if I can get
an answer. This did refer to testing a coin for bias, but it might well be applicable to this exercise/puzzle.
Even a perfectly "fair" coin has some degree of bias. I think I remember reading once about an experiment that was done
which established that there was a small bias in a regular coin, however I cannot remember the details and frankly it can
only have been slightly weighted one way or the other.

Re: Statistics

2013-05-10T18:20:16Z

I am sorry I asked you to come over here. You do not have to reply here ever again if you do not want to.

Just one thing. If you had ever read the quote in gAr's signature, you would have known the reason I do not agree with your solution.

Re: Statistics

2013-05-10T18:13:18Z

which is not even related to the question.

That is not true.

Okay, I have an answer you do not. I asked you to compute the answer using your method. Then I would show you why mine is an exact answer.

I agreed to come away from the problems I was doing because you said I post a problem ( very difficult to do, alot of work ) and you would solve it. Instead you are arguing over terms, if you already know the answer to these then what the heck am I wasting my time for?

I am sorry, I do not need a protracted debate. Thanks for looking at the problem.

Re: Statistics

2013-05-10T18:02:56Z

Sorry, you computed P(h>=57|p=1/2), which is not even related to the question.

Re: Statistics

2013-05-10T17:54:16Z

That is exactly what the question is asking. The question was designed around the concept of dealing with a result that could be luck.

The answer you got is P(h=57,t=43|p=1/2)

That is not what I computed.

Re: Statistics

2013-05-10T17:51:16Z

The answer you got is P(h=57,t=43|p=1/2) which is not what the questin is asking for.

Re: Statistics

2013-05-10T17:21:04Z

Hi;

It is not necessarily true that the victorious player is superior. In the most simplistic sense if we flip a coin 5 times, heads or tails will be victorious with 3 or more wins out of 5. Two players evenly matched ( fair coin ) can produce extremely long runs of such dominance just by luck. Take a random walk for example. The player in the lead will have that lead for more than 90% of the time, even though they are perfectly even. It would be very easy for that player to assume he was much better than the other!

Re: Statistics

2013-05-10T17:16:23Z

What is the probability that the apparently inferior player (according to common sense interpretation that the player that won
57 games out of 100 is "superior" and the player that won only 43 is "inferior") is in fact a player whose underlying ability
is greater than the common sense "superior" player with the assumption that a player has greater underlying ability if and
only if the theoretical probability of him/her winning in the long term P is greater than or equal to 0.5 ?

That is the probability of ( P >= 0.5 ) ?

Re: Statistics

2013-05-10T10:14:07Z

The coin probability can take any value between 0 and 1 and we just want the probability that the probability is 1/2. That means we need a continuous distribution.

What are you proposing?

Re: Statistics

2013-05-10T09:59:30Z

We are testing the assertion that they are equal in strength, therefore the probability of either winning is 1 / 2. The continuous one is only an approximation to the discrete one. It is used when you can not calculate the discrete one.