As usual, our starting point is a random experiment, modeled by a probability space. So to review, is the set of outcomes, the collection of events, and the probability measure on the sample space . Suppose that is a real-valued random variable for the experiment. Recall that the mean of is a measure of the center of the distribution of . Furthermore, the variance of is the second moment of about the mean, and measures the spread of the distribution of about the mean. The third and fourth moments of about the mean also measure interesting (but more subtle) features of the distribution. The third moment measures skewness, the lack of symmetry, while the fourth moment measures kurtosis, roughly a measure of the fatness in the tails. The actual numerical measures of these characteristics are standardized to eliminate the physical units, by dividing by an appropriate power of the standard deviation. As usual, we assume that all expected values given below exist, and we will let and . We assume that , so that the random variable is really random.
Basic Theory
Skewness
The skewness of is the third moment of the standard score of :
The distribution of is said to be positively skewed, negatively skewed or unskewed depending on whether is positive, negative, or 0.
In the unimodal case, if the distribution is positively skewed then the probability density function has a long tail to the right, and if the distribution is negatively skewed then the probability density function has a long tail to the left. A symmetric distribution is unskewed.
Suppose that the distribution of is symmetric about . Then
.
Details:
By assumption, the distribution of is the same as the distribution of . We proved part (a) in the section on properties of expected Value. Thus, . But by symmetry and linearity, , so it follows that .
The converse is not true—a non-symmetric distribution can have skewness 0. Examples are given in [30] and [31].
can be expressed in terms of the first three moments of .
Details:
Note tht . From the linearity of expected value we have
The second expression follows from substituting .
Since skewness is defined in terms of an odd power of the standard score, it's invariant under a linear transformation with positve slope (a location-scale transformation of the distribution). On the other hand, if the slope is negative, skewness changes sign.
Suppose that and . Then
if
if
Details:
Let , the standard score of . Recall from the section on variance that the standard score of is if and is if .
Recall that location-scale transformations often arise when physical units are changed, such as inches to centimeters, or degrees Fahrenheit to degrees Celsius.
Kurtosis
The kurtosis of is the fourth moment of the standard score:
Kurtosis comes from the Greek word for bulging. Kurtosis is always positive, since we have assumed that (the random variable really is random), and therefore . In the unimodal case, the probability density function of a distribution with large kurtosis has fatter tails, compared with the probability density function of a distribution with smaller kurtosis.
can be expressed in terms of the first four moments of .
Details:
Note that . From linearity of expected value, we have
The second expression follows from the substitution .
Since kurtosis is defined in terms of an even power of the standard score, it's invariant under linear transformations.
Suppose that and . Then .
Details:
As before, let denote the standard score of . Then the standard score of is if and is if .
We will show in that the kurtosis of the standard normal distribution is 3. Using the standard normal distribution as a benchmark, the excess kurtosis of a random variable is defined to be . Some authors use the term kurtosis to mean what we have defined as excess kurtosis.
Computational Exercises
As always, be sure to try the exercises yourself before expanding the details.
Indicator Variables
Recall that an indicator random variable is one that just takes the values 0 and 1. Indicator variables are the building blocks of many counting random variables. The corresponding distribution is known as the Bernoulli distribution, named for Jacob Bernoulli.
Suppose that is an indicator variable with where . Then
Details:
Parts (a) and (b) have been derived before. All four parts follow easily from the fact that and hence for .
Open the binomial coin experiment and set to get an indicator variable. Vary and note the change in the shape of the probability density function.
Dice
Recall that a fair die is one in which the faces are equally likely. In addition to fair dice, there are various types of crooked dice. Here are three:
An ace-six flat die is a six-sided die in which faces 1 and 6 have probability each while faces 2, 3, 4, and 5 have probability each.
A two-five flat die is a six-sided die in which faces 2 and 5 have probability each while faces 1, 3, 4, and 6 have probability each.
A three-four flat die is a six-sided die in which faces 3 and 4 have probability each while faces 1, 2, 5, and 6 have probability each.
A flat die, as the name suggests, is a die that is not a cube, but rather is shorter in one of the three directions. The particular probabilities that we use ( and ) are fictitious, but the essential property of a flat die is that the opposite faces on the shorter axis have slightly larger probabilities that the other four faces. Flat dice are sometimes used by gamblers to cheat.
A standard, fair die is thrown and the score is recorded. Compute each of the following:
Details:
An ace-six flat die is thrown and the score is recorded. Compute each of the following:
Details:
A two-five flat die is thrown and the score is recorded. Compute each of the following:
Details:
A three-four flat die is thrown and the score is recorded. Compute each of the following:
Details:
All four die distributions above have the same mean and are symmetric (and hence have skewness 0), but differ in variance and kurtosis.
Open the dice experiment and set to get a single die. Select each of the following, and note the shape of the probability density function in comparison with the computational results above. In each case, run the experiment 1000 times and compare the empirical density function to the probability density function.
fair
ace-six flat
two-five flat
three-four flat
Uniform Distributions
Recall that the continuous uniform distribution on a bounded interval corresponds to selecting a point at random from the interval. Continuous uniform distributions arise in geometric probability and a variety of other applied problems.
Suppose that has uniform distribution on the interval , where and . Then
Details:
Parts (a) and (b) we have seen before. For parts (c) and (d), recall that where has the uniform distribution on (the standard uniform distribution). Hence it follows from the formulas for skewness in [4] and kurtosis in [7] under linear transformations that and . Since for , it's easy to compute the skewness and kurtosis of from the computational formulas skewness in [3] and kurtosis in [6]. Of course, the fact that also follows trivially from the symmetry of the distribution of about the mean.
Open the special distribution simulator, and select the continuous uniform distribution. Vary the parameters and note the shape of the probability density function in comparison with the moment results in the last exercise. For selected values of the parameter, run the simulation 1000 times and compare the empirical density function to the probability density function.
The Exponential Distribution
Recall that the exponential distribution is a continuous distribution on with probability density function given by
where is the with rate parameter. This distribution is widely used to model failure times and other arrival times, particulalry in the context of the Poisson model.
Suppose that has the exponential distribution with rate parameter . Then
Details:
These results follow from the computational formulas for skewness in [3] and kurtosis in [6] and the general moment formula for .
Note that the skewness and kurtosis do not depend on the rate parameter . That's because is a scale parameter for the exponential distribution
Open the gamma experiment and set to get the exponential distribution. Vary the rate parameter and note the shape of the probability density function in comparison to the moment results in the last exercise. For selected values of the parameter, run the experiment 1000 times and compare the empirical density function to the true probability density function.
Pareto Distribution
Recall that the Pareto distribution is a continuous distribution on with probability density function given by
where is a parameter. The Pareto distribution, named for Vilfredo Pareto, is a heavy-tailed distribution that is widely used to model financial variables such as income.
Suppose that has the Pareto distribution with shape parameter . Then
if
if
if
if
Details:
These results follow from the standard computational formulas for skewness in [3] and kurtosis in [6] and the general moment formula if and .
Open the special distribution simulator and select the Pareto distribution. Vary the shape parameter and note the shape of the probability density function in comparison to the moment results in the last exercise. For selected values of the parameter, run the experiment 1000 times and compare the empirical density function to the true probability density function.
The Normal Distribution
Recall that the standard normal distribution is a continuous distribution on with probability density function given by
Normal distributions are widely used to model physical measurements subject to small, random errors.
Suppose that has the standard normal distribution. Then
Details:
Parts (a) and (b) were derived in the previous sections on expected value and variance. Part (c) follows from symmetry. For part (d), recall that .
More generally, for and , recall that the normal distribution with mean and standard deviation is a continuous distribution on with probability density function given by
However, we also know that and are location and scale parameters, respectively. That is, if has the standard normal distribution then has the normal distribution with mean and standard deviation .
If has the normal distribution with mean and standard deviation , then
Details:
The results follow immediately from [21] and the formulas for skewness in [4] and kurtosis in [7] under linear transformations.
Open the special distribution simulator and select the normal distribution. Vary the parameters and note the shape of the probability density function in comparison to the moment results in the last exercise. For selected values of the parameters, run the experiment 1000 times and compare the empirical density function to the true probability density function.
The Beta Distribution
The distributions in this subsection belong to the family of beta distributions, which are continuous distributions on widely used to model random proportions and probabilities.
Suppose that has probability density function given by for . Find each of the following:
Details:
Suppose that has probability density function given by for . Find each of the following:
Details:
Suppose that has probability density function given by for . Find each of the following:
Details:
Open the special distribution simulator and select the beta distribution. Select the parameter values below to get the distributions in [24], exericse [25], and [26], In each case, note the shape of the probability density function in relation to the calculated moment results. Run the simulation 1000 times and compare the empirical density function to the probability density function.
,
,
,
Suppose that has probability density function given by for . Find
Details:
0
96
The particular beta distribution in the last exercise is also known as the (standard) arcsine distribution. It governs the last time that the Brownian motion process hits 0 during the time interval .
Open the Brownian motion experiment and select the last zero. Note the shape of the probability density function in relation to the moment results in the last exercise. Run the simulation 1000 times and compare the empirical density function to the probability density function.
Counterexamples
The following exercise gives a simple example of a discrete distribution that is not symmetric but has skewness 0.
Suppose that is a discrete random variable with probability density function given by , , . Find each of the following and then show that the distribution of is not symmetric.
Details:
0
3
0
The PDF is clearly not symmetric about 0, and the mean is the only possible point of symmetry.
The following exercise gives a more complicated continuous distribution that is not symmetric but has skewness 0. It is one of a collection of distributions constructed by Erik Meijer.
Suppose that , , and are independent random variables, and that is normally distributed with mean and variance , is normally distributed with mean and variance , and is an indicator variable with . Let . Find each of the following and then show that the distribution of is not symmetric.
Details:
The distribution of is a mixture of normal distributions. The PDF is where is the normal PDF of and is the normal PDF of . However, it's best to work with the random variables. For , note that and and note also that the random variable just takes the value 0. It follows that
So now, using standard results for the normal distribution,
.
so
so
The graph of the PDF of is given below. Note that is not symmetric about 0. (Again, the mean is the only possible point of symmetry.)