Math Is Fun Forum

  Discussion about math, puzzles, games and fun.   Useful symbols: ÷ × ½ √ ∞ ≠ ≤ ≥ ≈ ⇒ ± ∈ Δ θ ∴ ∑ ∫ • π ƒ -¹ ² ³ °

You are not logged in.

#1 2007-11-19 03:53:14

petetheat
Guest

Statistics Correlation coefficient

Hey guys,

I've got a problem with the correlation coefficient. I've got a set of measured data and for some odd reason over a period of 3 minutes the data remains constant. I now want to determine the correlation coefficient and expected it to be 1 as the data has the same constant value over this period. As the standard deviation is obviously zero, I'm dividing by zero, and hence the result was NaN.

However, if I try to manipulate the recorded data and change one value slightly (e.g. instead of 10 I wrote 9.99999999), the result is nowhere near 1, but seems to asymptote -0.1287

I'm engineer and no mathematician, so I don't really know what's going on. I'd really appreciate if someone could help me out!!

#2 2007-11-19 19:19:08

George,Y
Member
Registered: 2006-03-12
Posts: 1,379

Re: Statistics Correlation coefficient

To think your goal is to solve such an r out

r*0*0=0

r is the correlation you want to solve out. But you cannot, and r has no solution here. But in the real case, you can define it as 1 because you are sure that they are due to stablize at the same time, it makes sense that they are completely relevant.


X'(y-Xβ)=0

Offline

#3 2007-11-19 19:46:46

petetheat
Guest

Re: Statistics Correlation coefficient

I was thinking as well that it should be 1. But how come that if I change one value slightly, the result is nowhere near 1, but -.1287?

Let's say I've got this set of data

t / s    |   1   |    2    |       3     |      4    |  ......  |   180
--------------------------------------------------------------
x / m  |  10  |  10    |      10     |   10     |  ....... |    10

then obviously I divide by 0 and can't solve for r. However, if I change say the last value to 9.99999999, the result is -.1287, which indicates that there is no linearity between x and t, even though there is.

So if the data wasn't constant (which it somehow is), I'd have a correlation coefficient that doesn't indicate linearity.

#4 2007-11-19 19:55:47

petetheat
Guest

Re: Statistics Correlation coefficient

Or to be precise, why doesn't a straight line with a gradient of zero have a correlation coefficient of 1?

#5 2007-11-20 16:57:14

George,Y
Member
Registered: 2006-03-12
Posts: 1,379

Re: Statistics Correlation coefficient

" r has no solution here."
_it should be many any solutions instead of no solution.
0r=5 has no solution, whereas 0r=0 has, but no single determined solution.

I guess this can make your irrelevant case possible in a way.

Let's see why this works.
Suppose you are telling jokes to a friend, you have told one joke, and s/he doesn't laugh. Then you continue to tell jokes, but no matter how many jokes you have told, s/he still behaves like a dump wood.
Are we sure that your friend's good mood is irrelevant to how many jokes you tell?
Sure, irrelevant. No correlation

Now we can remodel this.
Everytime you tell a joke, your friend laughs.
If we divide the time interval to be the time required for your to tell a joke
Then we can come up with the data that how many jokes you tell during such a period and how many laughs your friend has in the mean period.
Then
1 1 1 1 ...
1 1 1 1 ...
Are you sure that when x=1  y is guaranteed to be 1? And if x=0 y==0? If you come up with such a conclusion and you are guaranteed that they share the same exact probability, they syncronize. They are totally related. They can determine each other. So they are totally related.

Note, if you use the cumulative jokes and laughs, you get out of the 0/0 problem and get it 1.

Now you are stuck with the fact that your friend's laugh is in the middle. How can you judge this? Well, they have come up with the correlation formula, where
correlation=covariance/(stdv(X)stdv(Y)]
covariance=Sum(Xi-Xmean)(Yi-Ymean)/n
stdv(X)=Sum(Xi-Xmean)

They have made a good reason on two extreme cases. If X stays the same irrespective of Y's change, we know that they are irrelavent, and the formula gives the 0 result; If Y is a linear combination of X, which means Y=a+bX, the formula returns 1. And further they have used triangular property to prove that r^2 is no large than 1

But that's the best the correlation defination can give. It isn't a truth or fact, it is just an arbitary and artificial index made by staticians which happens to enjoy these three properties, which gives it some sense. But it has its pitfalls as well.

How about Y=X^2 or Y=2^X, they are totally relevant but without a correlation 1. The correlation index cannot capture a non-linear relationship.

Another pitfall is that it cannot have a denominator as 0. This comes to the pitfall of linear language. Just recall the words I have written:

If X stays the same irrespective of Y's change, we know that they are irrelavent, and the formula gives the 0 result; If Y is a linear combination of X, which means Y=a+bX, the formula returns 1.

Swap X and Y, it makes sense as well. But why don't I use them equally in the sentence? Because I am handicapped by the language to. The language flows linearly, from the subject to the object, hindering me to express the mutual relationship. The only way out is to state the inverse as well. But that only means this direction+ that direction, not so perfect to reflect the nondirectional truth.

That's how people go with slope or ratio. people can only say y will change 2 if x change 1 or x will change .5 if y change 1 -they have to simplify the dual process to a single directional one to think. That results in slope's inability to capture a horizontal line. You have to jump out of dY/dX and use another direction dX/dY instead or switch to the more complicated undirectional vector thinking (dX,dY)=k(1,2).

However, this is not a big issue. A function, at the beginning, bares the role of discovering a causal relationship. How do we know whether two things, one is the cause of another or they are both the result of some other thing? Bacon puts it:
If one increases and the other increases or decreases correspondingly, we can infer they have a causal relationship.
Or at least we are sure they have a conditional relationship. (which is just weaker than a causal relationship and avoids the debate which is which's cause or whether there exists a third factor as their shared cause)

But what happens to two constant things? The earth has a constant amount of carbon element as it has a constant travelling speed, almost. But can you say that one is the other's cause or they share a common cause? Not necessarily, right?

So here is the key. You have to show some variation to prove such an at least conditional relationship. Conditions means more than one possibilities. So back to the correlation case, either an 0 and 0 sample  or an 2 and 2 sample has to be added to the 1 and 1 samples in order to make sense to predict an if-then relationship.

Last edited by George,Y (2007-11-20 17:02:46)


X'(y-Xβ)=0

Offline

#6 2007-11-20 21:42:28

NullRoot
Member
Registered: 2007-11-19
Posts: 162

Re: Statistics Correlation coefficient

Think of it this way, Pete. If you graphed this with "t / s" being X and "x / m" being Y, then your equation would be:
y = 0x + 10

It's a line, with slope 0 and a y-intercept of 10, right? Looking at that equation, is there any correlation between X and Y? Does your X input have any effect on the value of Y?

Now if you change the last value to something other than 10 (9.99999, for instance), y = 0x + 10 no longer holds as the equation of the line and because the value of Y is different for a given input of X, then there will be some correlation, but it will most likely be very small.



EDIT: Put the equation on it's own line. It was carrying over multiple lines lol

Last edited by NullRoot (2007-11-20 21:43:53)


Trillian: Five to one against and falling. Four to one against and falling… Three to one, two, one. Probability factor of one to one. We have normality. I repeat, we have normality. Anything you still can’t cope with is therefore your own problem.

Offline

Board footer

Powered by FluxBB