You are not logged in.

- Topics: Active | Unanswered

Pages: **1**

**bechau****Member**- Registered: 2014-04-07
- Posts: 4

Currently I`m working with a positively skewed data.

I would like to calculate the range of values that represent x% of the data (CI). How do I do this with skewed data?

Offline

**bobbym****Administrator**- From: Bumpkinland
- Registered: 2009-04-12
- Posts: 100,366

Hi;

Why not post the exact problem?

**In mathematics, you don't understand things. You just get used to them.****If it ain't broke, fix it until it is.** **Thinking is cheating.**

Offline

**bechau****Member**- Registered: 2014-04-07
- Posts: 4

Exact problem is I'm trying to calculate what's the range of income of 98% of the population.

The income distribution is positively skewed.

Normally, if the data is normally distributed, I would:

1. Calculate the mean, std dev

2. use 2 std dev to find the range below the mean to find out the answer. (that 98% of the population earns mean - 2 std dev or more)

Offline

**bobbym****Administrator**- From: Bumpkinland
- Registered: 2009-04-12
- Posts: 100,366

Is that all you have? What is the PDF? How can anyone compute the area under a curve that is unknown? Do you have the data?

**In mathematics, you don't understand things. You just get used to them.****If it ain't broke, fix it until it is.** **Thinking is cheating.**

Offline

**bechau****Member**- Registered: 2014-04-07
- Posts: 4

sorry bobbym, I don't have the data.

This is just a theoretical question as I my professor couldn't answer my question regarding the use of CI from the normal distribution on a non-normal distribution data, i.e., the income distribution.

Am I right to assume that the use of CI from normal distribution theory cannot be used in explaining the non-normal distribution data?

If yes, how do I calculate the CI for the non-normal distribution data.

Offline

**bob bundy****Moderator**- Registered: 2010-06-20
- Posts: 7,444

hi bechau

Welcome to the forum.

The normal distribution is probably the most studied statistical function. It's symmetrical and just two parameters (mean and standard deviation) are sufficient to completely determine its behaviour. Once you start looking at new distributions you have to have data in order to make similar analyses.

The starting point would have to be to gather lots of actual figures for income. Once you have that, you might be able to fit a function to the data, but, you'll probably already have the answer to your question as it will be embedded in the data.

Also, bear in mind that anything you can calculate will only be valid for that 'population'. Income values vary enormously around the world.

There's a small article at

http://www.mathsisfun.com/data/skewness.html

and a longer one at

http://en.wikipedia.org/wiki/Skewness

Bob

You cannot teach a man anything; you can only help him find it within himself..........Galileo Galilei

Offline

**bobbym****Administrator**- From: Bumpkinland
- Registered: 2009-04-12
- Posts: 100,366

Am I right to assume that the use of CI from normal distribution theory cannot be used in explaining the non-normal distribution data?

Without knowing more I can do little. Skewed curves can look very different from the SNC.

Am I right to assume that the use of CI from normal distribution theory cannot be used in explaining the non-normal distribution data?

You mention the income distribution? Is that the distribution you want to compute the percentages from?

Read the links and given by Bob and see if that helps.

**In mathematics, you don't understand things. You just get used to them.****If it ain't broke, fix it until it is.** **Thinking is cheating.**

Offline

**bechau****Member**- Registered: 2014-04-07
- Posts: 4

Thanks guys.

Actually, the short article is the one that got me thinking and asking the professor that very question.

In his class, he shows a theoretical distribution of income that looks exactly like the one in the short article, a positively skewed distribution (mean is towards the lower income). There was no real data behind it and we were just debating on the estimating CI. I said that since the data is skewed, we cannot use the Excel STDEV's value to mark the CI range. He agrees. But when I ask him how to define the CI for skewed distribution data. He couldn't answer my question.

As for the long article, I read it but didn't find the answer I was looking for, i.e., what is the income range that 95% of the population belongs to?

If I were to have the real data, I CAN calculate from the accumulative numbers of people at each income intervals and find out the answer to my question.

However, what I really want to know was how to use Excel to work on a set the sample non-normal distributed data to answer this question.

Offline

**bob bundy****Moderator**- Registered: 2010-06-20
- Posts: 7,444

I'll check out the Excel function. Back later.

*Last edited by bob bundy (2014-04-07 23:11:42)*

You cannot teach a man anything; you can only help him find it within himself..........Galileo Galilei

Offline

**bob bundy****Moderator**- Registered: 2010-06-20
- Posts: 7,444

The Excel function SKEW calculates

This is only one of a number of formulas for calculating skewness.

In one of my examples I had a frequency distribution (ie. a column of x values and a column of frequencies). I calculated the skew from the formula above and using SKEW and got wildly different answers. But I then realised that the Excel SKEW function has no inputs for frequency so was treating each of my x values as if it occurred just once. So it cannot be used in these circumstances. I haven't found a 'formula' that will act as a generating function. You probably have to create a scatter graph using income and frequency and then try to fit a function by trial and improvement.

Bob

You cannot teach a man anything; you can only help him find it within himself..........Galileo Galilei

Offline

**bobbym****Administrator**- From: Bumpkinland
- Registered: 2009-04-12
- Posts: 100,366

This is just a theoretical question as I my professor couldn't answer my question regarding the use of CI from the normal distribution on a non-normal distribution data, i.e., the income distribution.

A smooth kernel distribution is the theoretical answer to your theoretical question. With it you can compute the mean, variance and other moments. Also, the area under the curve of the PDF can be integrated giving the probabilities you want.

However, what I really want to know was how to use Excel to work on a set the sample non-normal distributed data to answer this question

A smooth kernel distribution is possible provided you have the data points. This should be useful because we can then treat it as any other PDF. If you have a picture of the curve, post it and the data points can retrieved.

Trouble is, that although I see that many computer programming languages can create this smooth kernel distribution Excel is not one of them.

**In mathematics, you don't understand things. You just get used to them.****If it ain't broke, fix it until it is.** **Thinking is cheating.**

Offline

Pages: **1**