You are not logged in.
Thankyou I am now convinced that you have answered the question 1 bit correctly.
I quite often find that I am okay once I have established which test is the right one to use
out of a very large number of hypothesis tests that have been invented.
Of course the design of the pseudo study was one I deliberately put in a way as to make it
unusual in that it would only be valid to think of the months in a categorical way if there
were an artificial construct that made one unknown month very different (perhaps) to all
of the others (or at least gave each month its own independent character).
If as in the "common cold" case you wanted to test for whether cold times of the year had
some link to the frequency of catching the cold virus you would not of course consider there
to be a sudden jump from February to March for instance. A natural occurence rather than
a sudden artificial leap would be happening, so the study would be best if it were designed
in a way to give us more accurate information to test correlation between temperature and
the number of cases of the cold illness.
I suppose if a factory were giving off a pollutant just in January only and in no other month,
then if the chemical were to make monthitis more likely to occur at birth, then maybe the
might be a very rare use for this. On the other hand why would the newborn infant be suddenly
able/unable to "catch" the "condition" ONLY exactly at birth/not after birth.
In practice it would not happen like that so even in this case the study would be daft.
To be frank I cannot at this time think of a serious reason why this would be a statistically desirable
hypothesis test study design.
On the other hand given the question that I asked I would give your answer full marks.
Nice one, and a good refresher lesson for me in Chi-Squared tests.
That will work fine:
I will use a criterion of 5%.
Theory: there is no correlation between months and the number of illnesses.x^2 = 20.936
Was the "x squared" bit based upon a statistical test? Was it a Chi-Squared Test? Are you sure that this is the correct test?
Or was this based upon linear regression?
There was ONLY SUPPOSED to be EXACTLY ONE illness concerned which ALL the people were known to have.
When you say "illnesses" was that a typing error or did you model it as if you were counting how many illnesses in total
were reported to have occured overal in that month? At the time of writing I have not been able to check how you could
model this. Perhaps Linear Regression can be used if this is the assumption, but that would be your question not mine.
You are probably much more experienced than me in this, better qualified and have better software and a newer computer
so don't get me wrong I am not arguing with you it's just that you have used a different test to the one I would have leapt
for had I still had access to Genstat. Out of interest do you have a statistical software package? If so which one if you don't
mind me asking? (Is there a free tool for this with Wolfram? A Wolfram website I looked at some time ago did not seem to do this.)p=0.03404
(3.404 %)
My calculation seemed to suggest about 0.9 % but as you could tell I did NOT trust my answer at all. 3.404% seems right roughly.
That agrees with my rough intuition better than my attempt at a calculation. Was that a linear regression related probability?
Or Chi-Squared? Or Loglinear? Or Multinomial simulation?
By conventional criteria, this difference is considered to be statistically significant.
The p value answers this question: If the theory that generated the expected values were correct, what is the probability of observing such a large discrepancy (or larger) between observed and expected values? A small P value is evidence that the data are not sampled from the distribution you expected.
I would reject the theory that months and illness are not correlated since there is only a 3% chance that the above result happened by chance.If your test wanted 1% then we can not rule out that the above data is chance.
Yes I seemed to have "concluded" that we could, but realised that there was something wrong with my attempt.
I thought that using a graphics calculator to fudge a rough solution by trying to "simplify" things was not really going
to give a very accurate answer. I was going to suggest a Loglinear Contigency Table style solution to this, but needed
Genstat to remind me as to whether this was appropriate. I have still got the text book that accompanied the course
which I studied in 2006 on this topic (for which I got a grade 2 pass in a system where grade 1 is best and grade 4 is
just an ordinary pass grade so I was above average, but not exactly amazing at Linear Statistical Modelling).
I think I need a bit of a refresher course on this if I ever do get to use it for something serious, but as yet I have never
had a job which I would need to do this, but you never know what might happen.
I have decided that I accept your answer as correct, but do not know how you reached the answer nor what formal test
you did. You may even have done a simulation of the "exact" situation to see how many times something as significant
or more significant would happen. In which case you could consider that an unbeatable answer.
Thanks.
Question 1 was an entirely fictional example so I have made up some data.
Okay here goes:
Off the top of my head though suppose for instance that it was:
January: 114 people born in this month suffer from monthitis
February: 77
March: 78
April: 77
May: 74
June: 82
July: 72
August: 77
September: 71
October: 89
November: 93
December: 96
Right so the question is that given that no statement in advance has been made about January having a higher frequency
of cases than any other month and that a two tailed test is needed, how statistically significant is the above result if it were
a study by a research group using a conventional hypothesis test?
Obviously my question is entirely made up including all of the data.
The sample size is n = 1000
All numbers for each month are confirmed cases of the fictional medical condition or illness.
Consider a fictional hypothetical illness. Let us call it "monthitis". It is thought that there is a serious reason why it might be more
common for people born in a particular month to get the condition. A study is to be done, but we do not have any advanced knowledge
of which month it would be that is more common. A month is to be modelled for simplicity in a way so that each month is the same
length that is to say the fact that it may be 28,29,30 or 31 days long can be disregarded for an initial model to avoid unnecessary high
precision.
(Obviously if you can program a computer to work out a better solution then by all means go ahead. I have used a binomial model with
normal approximation without continuity correction to get an estimate, really a multinomial model is best here, but I cannot do that
really without a computer package to help, because the calculations are likely to be very complicated and error prone unless someone
has a bright idea of an easy way of doing this in a simplified fashion analytically by hand.)
A normal approximation to a binomial model with p = (1/12) and n = 1000 (sample size 1000) obviously q = (11/12) will be considered
preferable to make this something that could be done using a calculator and a good table of Normal Z values.
Correct me if I am wrong, but this is what I think:
mean = np = 83.333...
variance = npq = 76.388...
Therefore we want the model N(83.333, 76.388...) [N = Normal distribution with mean then variance]
The standard deviation is 8.74 (approx.)
Now how many standard deviations do we need for a robust piece of evidence that allows the null hypothesis to be conventionally
rejected at the 1% level of significance? [You may be able to phrase that in better statistical terms]
Should we divide the value SP = 0.01 by 12 to allow for the number of months not being stated in advance ?
This intuitively seems right to me, but is it correct to divide by 12 statistically speaking ?
A value of SP = 0.005 (halving the SP to account for a two-tailed test) gave me 2.56 standard deviations.
However if you divide that by 12 then SP = 0.0004166....
this gave me a figure of 3.34 standard deviations (ie. 29.1916 which added to the mean of 83.333 is 112.5... or let us say 113 people)
So if we have 113 people with monthitis in the most commonly encountered month for people with this condition, is this the point at
which statistical significance can be argued to be reasonable at the 1% level or have I missed something out ?
Intuitively I would expect the figure to be higher. (eg. 160 approx.) [Perhaps this is a reflection of the unwise notion of not stating the month in advance with a reason for the choice.]
Question 1:
Can anyone do this with a more exact model using a computer assuming 28 days in February and the correct number of days for all
other months, using a multinomial model (or Loglinear generalized linear model)?
Obviously I ask just out of interest here (I am not studying anything) and as a challenge/exercise/discussion.
In reality of course a proper scientific theory would be needed to justify the initial hypothesis. It would probably mean in practice that
an illness could be more likely to be caught in a particular season such as winter or summer for instance (the fact that I mentioned
birth is irrelevant really in terms of statistical testing, so a similar principle can be applied to a question involving the start date of
the illness).
A better question might therefore be:
Question 2
If a statistical test were done to establish whether the common cold were more likely to be caught in the winter (ie. From the 1st
December to 28th February) in a country in which the cold season is at this time of year according to climate statistics, what threshold
would need to be crossed to confirm using a hypothesis test in terms of number of people out of 1000 cases of a cold being caught
(using a suitable design of a study) being in winter relative to the number caught at a different time of year?
(A method would be nice for that ideally, personally I would start by calculating the number of days from the 1st of December to the
28th of February and divide by 365, then use this as p. Then use the binomial model and use a normal approximation if no computer is
available, but a pocket calculator and a normal distribution table is. Use SP = 0.01 I wonder whether any adjustment is necessary. It
is not really two tailed. Good idea to check whether a continuity correction makes a significant difference I dont think that the
difference is going to matter much in this case.)
The second question is better because a natural climate related reason has been given together with the potential for a plausible
theory. The first question is not good practice in science because no good reason has been given in advance for a particular month of
birth of the monthitis sufferer having a greater likelihood of getting the condition later on in life. It is very unusual for an illness to be
just most likely in a particular month without the neighbouring month or months being also more likely to a lesser extent (whether it is
birth or catching the illness that is being considered).
There was a suggestion that I heard about in a serious study* where because of the education system starting the academic year in
September, a psychological or even physical bias had been noticed to a better performance towards those born near the start of the
academic year (September) relative to the end (August). [*IFS according to the BBC website in Oct 2007.]
This is very different to question 1 of course because I have given a scientific basis for the hypothesis (also the response variable of the
academic performance example above is a numeric concept allowing more detail not true or false as with my made up monthitis). You
would expect a similar result to a lesser extent in people born in October to the people born in September and a sudden jump from
September to August then a minor improvement in July. I suppose with that one you could define the start of the year to be on a
certain day (with careful checks on the official start dates in the country concerned), and do a test based upon the number of days into
the academic year rather than use months. Exactly what statistical test would be used there I do not know (correlation/more linear
statistical modelling?). Obviously a correlation would not prove a causal relationship (the correlation is probably quite weak and does
not prove the matter completely even if it is strong; the sample size however in the real study was probably massive which is still not
proof of course, but makes the evidence stronger). No statistics can ever really prove a causal relationship, but on the other hand what
other method would we use for a social science related study?
Hi Mandy, SteveB here.
I have sent an email reply.
The cubic equation for dy/dx=0 was:
0 = 12x^3 + 3(k-18)x^2 + (64-12kx) + 8k-48
The DELTA discriminant for whether this cubic equation has:
3 distinct real roots when DELTA > 0
multiple root with all roots real when DELTA = 0
or one real root and two complex conjugate roots when DELTA < 0
(Source: Wikipedia entry for cubic polynomial equations)
Using an algebraic package you can solve this for k where DELTA = 0
So DELTA = 432k^4 + 12096k^3 + 83520k^2 - 345600k - 3998208
I then used a grapical calculator with a polynomial of order 4 solver to get numerical solutions:
Two of them were complex conjugates of (-12.1161977552, +/- 3.62069124665) these are irrelevant for this analysis.
The real solutions are relevant, one of them was Bob's k = 5.9536152399
The other was also mentioned by Bob earlier to less accuracy and I am getting: k = -9.72121972948
These solutions for k indicate that there is a multiple root with all roots real and a graph shows that they have two stationary points
in terms of the original quartic expression involving x and k. One is an inflection and the other a minimum.
You could work out whether there are 3 turning points or 1 turning point for the ranges either side and in between these
if you wanted to give a full analysis of how many turning points you get for all values of real k.
(Perhaps draw a graph on a graphics calulator for the function of DELTA in terms of k and see where it is above zero
and where it is below zero. Then use the wiki quote that I gave above.)
If I am understanding this correctly there is one stationary point inbetween the two (k>-9.7212197... and k<5.9536152...)
and there are three stationary points for k < -9.721297... and for k > 5.9536152...
The exact formula for calculating the roots of k is extremely complicated and I would not like to attempt that one
without a something like Wolfram or another computer algebra package.
I am completely lost and I need to know how to work out these problems. I have the following:
N=C(14,5)
=2002I know that is the answer for this step of the problem but I don't know how 2002 was reached. Please help.
14! / (9! x 5!) = 2002
14! = 14x13x12x11x10x9x8x7x6x5x4x3x2x1
9! = 9x8x7x6x5x4x3x2x1
5! = 5x4x3x2x1
14! / (9! x 5!) = (14x13x12x11x10) / (5x4x3x2x1) = 2002
So we have 14 people and we are choosing 5 of them in any order in terms of the 5 and the remaining 9.
Imagine that we have 14 people and we wish to choose one. There are 14 ways of doing this.
Then choose one of the remaining 13, then one of the remaining 12, and so on down to 10.
There are five reorderings of the five chosen people that we have counted too many times.
The division by 5! is to allow for this. The number obtained is 2002.
Hence there are 2002 ways of choosing 5 people from 14.
(Assuming that the order of the 5 people is not important eg. {1,2,3,4,5} = {5,3,4,1,2} etc.)
I may have joined this discussion too late, but this is my way of doing this question:
The density of water is about 1 g per ml
So 0.0018 ml = 0.0018 g (appox.)
The molar mass of water is: 2x1 + 1x16 = 18 g per mol (approx.)
Then I divided by the molar mass of water to get the number of moles of water:
0.0018 / 18 = 1 x 10^-4 (mol H2O approx.)
Then multiply by Avogadros Constant: 6.022... x 10^23 (atoms or molecules per mol)
1 x 10^-4 x 6.022 x 10^23 = 6.02 x 10^19 (approx.)
Some of the figures I have used are approximate, but since the 0.0018 ml is only
supposed to be approximate I don't think much more accuracy was intended.
I have just read the original post again, and I can see a bit which I did not read the first time:
No repayments made
This makes things a lot easier, and is not like a mortgage repayment because that is based upon the assumption
that repayments will be made each month which are taken off the loan (well it is in the case I was thinking of).
If I am understanding things correctly you are supposed to be doing something like this:
Let A = 200000 (The amount of the original loan.)
Let i = 14/(12 * 100)
So i = 0.01166666667 (to the accuracy of a calculator)
The extra division by 100 is to convert a percentage to a decimal.
The division by 12 is to convert into an amount per month.
(Strictly speaking the 12th power root should be taken, but appearently in USA conventions this is not
how it is done. Instead it is divided by 12 for simplicity and they do not worry about the fact that this
raised to the power of 12 is not the same when added to 1 and then 1 is subtracted at the end if you
see what I mean. Compare 1.01166666667^12 to 1.14 they are not the same.)
So if I add 1 to the value of i to represent adding 100%
F = i + 1
F = 1.01166666667
My variable of F is supposed to be the factor of increase in the amount owed per month of compounding.
Using my interpretation, and this is the bit that I do not know whether it is correct, we should do this:
I am getting: 430032.30
This assumes that there are n=66 months of accumulation. (Not sure whether this is correct.)
Hi
I should point out the following things:
(1) I have not done interest calculations for ages.
(2) The conventions used in the USA may be different to the ones that I studied many years ago.
(3) There are some terms used in the question that I do not know what exactly they mean.
The following link might be helpful:
http://en.wikipedia.org/wiki/Compound_interest
Look at the bit about "monthly mortgage payments" for a mortgage loan calculation.
(I now realise that this will go into some unnecessary depth, but it may still be useful so I have left the link in.)
I am not sure whether this is the same system implied by the question.
I am not sure what "in arrears calculated and charged on monthly rests" in terms of the monthly iteration formula.
However the principal of it may be the same with a few adjustments.
(Actually I have re-read the question and I can now see that the problem is easier than I had thought.)
Thanks anonimnystefy. I thought there might be a division by zero somewhere !!
Yes. I agree. If you let k=0 then z = 1
Obviously (1 + 1)/1 = 2
So this cannot give us 1 when raised to the power of 5.
Why does my algebraic argument not work for k = 0
Usually a fifth root gives us five solutions. (????)
I can see your point about the negative sign as well .....
I was working on those formulas as you were writing that post.
I think it works and produces the right result.
There are 5 solutions.
I am not sure about this because I cannot do the very last bit of trigonometry but this is what I have got so far:
Using a right angled triangle in the Argand diagram:
Using standard trigonometric results we need this in terms of cot(A/2):
I have put in the minus sign in now. Well spotted anonimnystefy.
k = 0 is not a solution I agree Stefy, but why does my answer not work for k = 0 ?
EDIT: I now can see that cot(0) is not possible. Given that cot X = 1/(tan X) if X=0 then we have a division by zero.
Therefore the argument is not valid for this case.
To tidy things up I have decided to answer this one now:
This is already fully cancelled because there are no whole numbers above one that you can divide
both 13 and 21 by to leave a whole number.
Notice that it is possible to check the answer by doing 13 divided by 21 on a calculator and then
checking that this is the same as ((2 divided by 7) + (1 divided by 3)) the things in brackets are
done first. Both equal 0.619047.... (etc.) with the six digits {6,1,9,0,4,7} repeating in a cycle.
The second attempt, which I originally did not see because I was writing my post at the same time as yours, was correct.
2 x 3 = 6
7 x 3 = 21
So (2/7) = (6/21)
The next step is to do something similar to the fraction: (1/3)
The choice of denominator of 21 is okay.
However the fraction (2/21) is not correct.
Let's look first at the fraction (2/7).
In order to make this stay the same the numerator has to go up by the same factor (or multiple).
This will need you to multiply top and bottom by 3. Try again ?
EDIT: Be careful not to confuse multiplication of fractions with addition.
Now that the denominators are the same the addition is easy:
The second one was:
Do you want to have a go at this one yourself ?
With this one I would make sure that the denominators are the same first (the denominator is the number under the line).
So since the denominators are 4 and 5, it is best to multiply them.
This gives us 20.
To get the denominator to be 20, we have to multiply both numerator and denominator to make it this amount.
If you don't do this they will no longer represent the same amount mathematically.
Mandy: I have decided to post the first of those questions that I gave you on Friday:
(Q1)
I agree with Bob that the implies interpretation is the most sensible.
If we are going to think of H(x) as meaning
"a person x put through a function that outputs true if x is happy and false if x is not happy"
then we need the logical connective of implies which has a boolean item on both sides.
On the other hand let us look at another way of thinking about this:
Suppose H(x) is true if x is a member of the set H of all happy people and false otherwise.
Now let T(x) be true similarly if x is a member of set T of all theatre goers (and false otherwise).
Now let H(x) => T(x) using the interpretation using Bob's method.
Compare with "H is a subset of T".
Let us make up a set of four people: {person1, person2, person3, person4}
Let H = {person1, person2}
Let T = {person1, person2, person3}
H is a subset of T. If you are in H you must be in T.
If you are in T then you are not always in H.
Does this mean that in terms of the way I have defined them "H is a subset of T" means the same as "H(x) implies T(x)"?
As far as I know it does. However for domain reasons, if we are dealing with logic, the implies definition is essential.
You could redefine the whole exercise using set operators rather that boolean operators using Union, Intersection
and Complement from the universal set of all people.
These could replace OR, AND and NOT. Subset would replace implies... (you get the general idea).
h)there is at least one happy theatre goer who is quiet.
3) Ex((Hx ^ Tx) ^ Qx)
With the statement (h) this to my mind reads "There exists at least one person x who is (happy and a theatre goer and is quiet)".
The statement (3) seems to be the compact form of this.
I will try to help with a bit of LaTeX which can be done using math tags in square brackets, and using things like
\exists
\forall
I am not sure how to do a proper "and" symbol in LaTeX. The same applies to "or" and "not". I have used text for these.
When you have used the ~ symbol does this mean "not" or "a negation of" ?
Try comparing (2) with (a) for example.
U rotated clockwise half a turn
I wonder whether this means "is a subset of" or "is a proper subset of". (a "proper subset" of means "smaller subset" of)
Example: Let a set be {1,2,3} a subset would also include the set itself. A proper subset would have to be smaller.
{1,2} is a proper susbset of {1,2,3} (and also a subset).
{1,2,3} is a subset of {1,2,3} (but is not a proper subset).
Try comparing (7) and (g).
Are we saying "not(the set of happy people is a proper subset of the set of theatre goers)" ?
I have looked back at the first post and I can see what you mean:
The LaTeX needed
You made one mistake: not using the math bits in square brackets.
around the R_1 bit as in:
Apart from that I agree with your answer.
I am not sure about this mainly because I would probably have to actually have to have the text book (or sheet of paper)
that you have quoted that from to be certain. As far as I know this is not a universal concept in terms of the abbreviation.
The equation you have multiplied is done correctly - that is to say you have indeed multiplied by -2.
(As a critical observation: When you write 1/2 x it would be clearer if you wrote (1/2)x because it could be confused with (1/(2x))
which is obviously different and not what you meant. I am being picky here and you knew what you meant because the answer
you then gave was correct.)
The term "R" may mean "row". That is to say it may refer to a row operation - in this case multiplying the entire row by
a factor (in this case -2). If this is the case then I can see what you mean with your answer of -2R.
Whether this is "correct" as such needs a little more information I would think because I am not sure.
Are we defining R to be equal to the entire first row. R = [(1/2) x - 4y = 5]
If this is the case then -2R = [-x + 8y = -10]
I wonder whether you are supposed to use a subscript notation or something to indicate which row it is refering to.
With matirix calculations there is some notation to do with this. If it is solving simultaneous equations then I am not
sure what the currently taught notation is and it might vary from course to course and differ according to the country
and so on.
However it looks like a sensible answer to me.
I suppose this is designed to teach you to solve 2 simultaneous equations, or possibly 2 by 2 matrix calcuations.
To my mind at least it is the solving a problem bit, rather than what abbreviation should be used, that is the more
important thing.
4- Verify, which any of the formulas ¬ A v B and A or B substituted for x makes the formula ( A -> x ) -> ( x -> B ) a tautology.
Part 2: Mathematical Structures
Let us first substitute ((NOT A) OR B) which I hope means the same as ¬ A v B into x.
Right so that makes: (A -> (¬ A v B)) -> ((¬ A v B) -> B)
(LEFT HAND SIDE PART:This is true if and only if (A is false) OR ((¬ A v B) is true).)
(RIGHT HAND SIDE PART:This is true if and only if ((¬ A v B) is false) OR (B is true).)
There are four possibilities of this. I would usually use a table for this, but this is tricky with text only.
So. Case 1. A and B are true. Since B is true (¬ A v B) is true because of the OR logic. By the implication
statement logic the whole thing is true. Now let's look at the (x -> B) bit. (¬ A v B) -> B
well B is true so the whole thing is true (true -> true) gives a true output.
Case 2: A is true, but B is false. LHS: By OR logic if both sides are false the whole thing is false.
Since (true -> false) this outputs false. RHS: The (x -> B) ((¬ A v B) -> B) The left hand side is false so the implication
will output true. Overal: The output is true.
Case 3: A is false, but B is true. LHS: The statement simplifies to true. RHS: The statement simplifies to true. Overal: true.
Case 4: A is false, B is false. LHS: true RHS: true. Overal: true
Unless I have gone wrong there that is a tautology. I am not sure that this is the intended method and have no idea how
a course designer of something like this would intend you to present an answer to something like this. However I hope
that helps with understanding a bit. Apologies for any errors.
For (A OR B) a shortcut might be to observe that to try to produce a false output we need B to be false.
So let B be false in the expression:
(A -> (A OR B)) -> ((A OR B) -> B)
Consider the case A is true. (A OR B) is true. A is true, and B is false.
The left hand side is true (true -> true).
The right hand side is false (true -> false).
Overal: (true -> true) -> (true -> false) = (true -> false) = false
Hence this is not a tautology.