1. Random
  2. 3. Expected Value
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10
  13. 11
  14. 12
  15. 13

4. Skewness and Kurtosis

As usual, our starting point is a random experiment, modeled by a probability space (Ω,F,P). So to review, Ω is the set of outcomes, F the collection of events, and P the probability measure on the sample space (Ω,F). Suppose that X is a real-valued random variable for the experiment. Recall that the mean of X is a measure of the center of the distribution of X. Furthermore, the variance of X is the second moment of X about the mean, and measures the spread of the distribution of X about the mean. The third and fourth moments of X about the mean also measure interesting (but more subtle) features of the distribution. The third moment measures skewness, the lack of symmetry, while the fourth moment measures kurtosis, roughly a measure of the fatness in the tails. The actual numerical measures of these characteristics are standardized to eliminate the physical units, by dividing by an appropriate power of the standard deviation. As usual, we assume that all expected values given below exist, and we will let μ=E(X) and σ2=var(X). We assume that σ>0, so that the random variable is really random.

Basic Theory

Skewness

The skewness of X is the third moment of the standard score of X: skew(X)=E[(Xμσ)3] The distribution of X is said to be positively skewed, negatively skewed or unskewed depending on whether skew(X) is positive, negative, or 0.

In the unimodal case, if the distribution is positively skewed then the probability density function has a long tail to the right, and if the distribution is negatively skewed then the probability density function has a long tail to the left. A symmetric distribution is unskewed.

Suppose that the distribution of X is symmetric about a. Then

  1. E(X)=a
  2. skew(X)=0.
Details:

By assumption, the distribution of aX is the same as the distribution of Xa. We proved part (a) in the section on properties of expected Value. Thus, skew(X)=E[(Xa)3]/σ3. But by symmetry and linearity, E[(Xa)3]=E[(aX)3]=E[(Xa)3], so it follows that E[(Xa)3]=0.

The converse is not true—a non-symmetric distribution can have skewness 0. Examples are given in [30] and [31].

skew(X) can be expressed in terms of the first three moments of X. skew(X)=E(X3)3μE(X2)+2μ3σ3=E(X3)3μσ2μ3σ3

Details:

Note tht (Xμ)3=X33X2μ+3Xμ2μ3. From the linearity of expected value we have E[(Xμ)3]=E(X3)3μE(X2)+3μ2E(X)μ3=E(X3)3μE(X2)+2μ3 The second expression follows from substituting E(X2)=σ2+μ2.

Since skewness is defined in terms of an odd power of the standard score, it's invariant under a linear transformation with positve slope (a location-scale transformation of the distribution). On the other hand, if the slope is negative, skewness changes sign.

Suppose that aR and bR{0}. Then

  1. skew(a+bX)=skew(X) if b>0
  2. skew(a+bX)=skew(X) if b<0
Details:

Let Z=(Xμ)/σ, the standard score of X. Recall from the section on variance that the standard score of a+bX is Z if b>0 and is Z if b<0.

Recall that location-scale transformations often arise when physical units are changed, such as inches to centimeters, or degrees Fahrenheit to degrees Celsius.

Kurtosis

The kurtosis of X is the fourth moment of the standard score: kurt(X)=E[(Xμσ)4]

Kurtosis comes from the Greek word for bulging. Kurtosis is always positive, since we have assumed that σ>0 (the random variable really is random), and therefore P(Xμ)>0. In the unimodal case, the probability density function of a distribution with large kurtosis has fatter tails, compared with the probability density function of a distribution with smaller kurtosis.

kurt(X) can be expressed in terms of the first four moments of X. kurt(X)=E(X4)4μE(X3)+6μ2E(X2)3μ4σ4=E(X4)4μE(X3)+6μ2σ2+3μ4σ4

Details:

Note that (Xμ)4=X44X3μ+6X2μ24Xμ3+μ4. From linearity of expected value, we have E[(Xμ)4]=E(X4)4μE(X3)+6μ2E(X2)4μ3E(X)+μ4=E(X4)4μE(X3)+6μ2E(X2)3μ4 The second expression follows from the substitution E(X2)=σ2+μ2.

Since kurtosis is defined in terms of an even power of the standard score, it's invariant under linear transformations.

Suppose that aR and bR{0}. Then kurt(a+bX)=kurt(X).

Details:

As before, let Z=(Xμ)/σ denote the standard score of X. Then the standard score of a+bX is Z if b>0 and is Z if b<0.

We will show in that the kurtosis of the standard normal distribution is 3. Using the standard normal distribution as a benchmark, the excess kurtosis of a random variable X is defined to be kurt(X)3. Some authors use the term kurtosis to mean what we have defined as excess kurtosis.

Computational Exercises

As always, be sure to try the exercises yourself before expanding the details.

Indicator Variables

Recall that an indicator random variable is one that just takes the values 0 and 1. Indicator variables are the building blocks of many counting random variables. The corresponding distribution is known as the Bernoulli distribution, named for Jacob Bernoulli.

Suppose that X is an indicator variable with P(X=1)=p where p(0,1). Then

  1. E(X)=p
  2. var(X)=p(1p)
  3. skew(X)=12pp(1p)
  4. kurt(X)=13p+3p2p(1p)
Details:

Parts (a) and (b) have been derived before. All four parts follow easily from the fact that Xn=X and hence E(Xn)=p for nN+.

Open the binomial coin experiment and set n=1 to get an indicator variable. Vary p and note the change in the shape of the probability density function.

Dice

Recall that a fair die is one in which the faces are equally likely. In addition to fair dice, there are various types of crooked dice. Here are three:

A flat die, as the name suggests, is a die that is not a cube, but rather is shorter in one of the three directions. The particular probabilities that we use (14 and 18) are fictitious, but the essential property of a flat die is that the opposite faces on the shorter axis have slightly larger probabilities that the other four faces. Flat dice are sometimes used by gamblers to cheat.

A standard, fair die is thrown and the score X is recorded. Compute each of the following:

  1. E(X)
  2. var(X)
  3. skew(X)
  4. kurt(X)
Details:
  1. 72
  2. 3512
  3. 0
  4. 303175

An ace-six flat die is thrown and the score X is recorded. Compute each of the following:

  1. E(X)
  2. var(X)
  3. skew(X)
  4. kurt(X)
Details:
  1. 72
  2. 154
  3. 0
  4. 3725

A two-five flat die is thrown and the score X is recorded. Compute each of the following:

  1. E(X)
  2. var(X)
  3. skew(X)
  4. kurt(X)
Details:
  1. 72
  2. 114
  3. 0
  4. 197121

A three-four flat die is thrown and the score X is recorded. Compute each of the following:

  1. E(X)
  2. var(X)
  3. skew(X)
  4. kurt(X)
Details:
  1. 72
  2. 94
  3. 0
  4. 5927

All four die distributions above have the same mean 72 and are symmetric (and hence have skewness 0), but differ in variance and kurtosis.

Open the dice experiment and set n=1 to get a single die. Select each of the following, and note the shape of the probability density function in comparison with the computational results above. In each case, run the experiment 1000 times and compare the empirical density function to the probability density function.

  1. fair
  2. ace-six flat
  3. two-five flat
  4. three-four flat

Uniform Distributions

Recall that the continuous uniform distribution on a bounded interval corresponds to selecting a point at random from the interval. Continuous uniform distributions arise in geometric probability and a variety of other applied problems.

Suppose that X has uniform distribution on the interval [a,b], where a,bR and a<b. Then

  1. E(X)=12(a+b)
  2. var(X)=112(ba)2
  3. skew(X)=0
  4. kurt(X)=95
Details:

Parts (a) and (b) we have seen before. For parts (c) and (d), recall that X=a+(ba)U where U has the uniform distribution on [0,1] (the standard uniform distribution). Hence it follows from the formulas for skewness in [4] and kurtosis in [7] under linear transformations that skew(X)=skew(U) and kurt(X)=kurt(U). Since E(Un)=1/(n+1) for nN+, it's easy to compute the skewness and kurtosis of U from the computational formulas skewness in [3] and kurtosis in [6]. Of course, the fact that skew(X)=0 also follows trivially from the symmetry of the distribution of X about the mean.

Open the special distribution simulator, and select the continuous uniform distribution. Vary the parameters and note the shape of the probability density function in comparison with the moment results in the last exercise. For selected values of the parameter, run the simulation 1000 times and compare the empirical density function to the probability density function.

The Exponential Distribution

Recall that the exponential distribution is a continuous distribution on [0,)with probability density function f given by f(t)=rert,t[0,) where r(0,) is the with rate parameter. This distribution is widely used to model failure times and other arrival times, particulalry in the context of the Poisson model.

Suppose that X has the exponential distribution with rate parameter r>0. Then

  1. E(X)=1r
  2. var(X)=1r2
  3. skew(X)=2
  4. kurt(X)=9
Details:

These results follow from the computational formulas for skewness in [3] and kurtosis in [6] and the general moment formula E(Xn)=n!/rn for nN.

Note that the skewness and kurtosis do not depend on the rate parameter r. That's because 1/r is a scale parameter for the exponential distribution

Open the gamma experiment and set n=1 to get the exponential distribution. Vary the rate parameter and note the shape of the probability density function in comparison to the moment results in the last exercise. For selected values of the parameter, run the experiment 1000 times and compare the empirical density function to the true probability density function.

Pareto Distribution

Recall that the Pareto distribution is a continuous distribution on [1,) with probability density function f given by f(x)=axa+1,x[1,) where a(0,) is a parameter. The Pareto distribution, named for Vilfredo Pareto, is a heavy-tailed distribution that is widely used to model financial variables such as income.

Suppose that X has the Pareto distribution with shape parameter a>0. Then

  1. E(X)=aa1 if a>1
  2. var(X)=a(a1)2(a2) if a>2
  3. skew(X)=2(1+a)a312a if a>3
  4. kurt(X)=3(a2)(3a2+a+2)a(a3)(a4) if a>4
Details:

These results follow from the standard computational formulas for skewness in [3] and kurtosis in [6] and the general moment formula E(Xn)=aan if nN and n<a.

Open the special distribution simulator and select the Pareto distribution. Vary the shape parameter and note the shape of the probability density function in comparison to the moment results in the last exercise. For selected values of the parameter, run the experiment 1000 times and compare the empirical density function to the true probability density function.

The Normal Distribution

Recall that the standard normal distribution is a continuous distribution on R with probability density function ϕ given by ϕ(z)=12πe12z2,zR Normal distributions are widely used to model physical measurements subject to small, random errors.

Suppose that Z has the standard normal distribution. Then

  1. E(Z)=0
  2. var(Z)=1
  3. skew(Z)=0
  4. kurt(Z)=3
Details:

Parts (a) and (b) were derived in the previous sections on expected value and variance. Part (c) follows from symmetry. For part (d), recall that E(Z4)=3E(Z2)=3.

More generally, for μR and σ(0,), recall that the normal distribution with mean μ and standard deviation σ is a continuous distribution on R with probability density function f given by f(x)=12πσexp[12(xμσ)2],xR However, we also know that μ and σ are location and scale parameters, respectively. That is, if Z has the standard normal distribution then X=μ+σZ has the normal distribution with mean μ and standard deviation σ.

If X has the normal distribution with mean μR and standard deviation σ(0,), then

  1. skew(X)=0
  2. kurt(X)=3
Details:

The results follow immediately from [21] and the formulas for skewness in [4] and kurtosis in [7] under linear transformations.

Open the special distribution simulator and select the normal distribution. Vary the parameters and note the shape of the probability density function in comparison to the moment results in the last exercise. For selected values of the parameters, run the experiment 1000 times and compare the empirical density function to the true probability density function.

The Beta Distribution

The distributions in this subsection belong to the family of beta distributions, which are continuous distributions on [0,1] widely used to model random proportions and probabilities.

Suppose that X has probability density function f given by f(x)=6x(1x) for x[0,1]. Find each of the following:

  1. E(X)
  2. var(X)
  3. skew(X)
  4. kurt(X)
Details:
  1. 12
  2. 120
  3. 0
  4. 157

Suppose that X has probability density function f given by f(x)=12x2(1x) for x[0,1]. Find each of the following:

  1. E(X)
  2. var(X)
  3. skew(X)
  4. kurt(X)
Details:
  1. 35
  2. 125
  3. 27
  4. 3314

Suppose that X has probability density function f given by f(x)=12x(1x)2 for x[0,1]. Find each of the following:

  1. E(X)
  2. var(X)
  3. skew(X)
  4. kurt(X)
Details:
  1. 25
  2. 125
  3. 27
  4. 3314

Open the special distribution simulator and select the beta distribution. Select the parameter values below to get the distributions in [24], exericse [25], and [26], In each case, note the shape of the probability density function in relation to the calculated moment results. Run the simulation 1000 times and compare the empirical density function to the probability density function.

  1. a=2, b=2
  2. a=3, b=2
  3. a=2, b=3

Suppose that X has probability density function f given by f(x)=1πx(1x) for x(0,1). Find

  1. E(X)
  2. var(X)
  3. skew(X)
  4. kurt(X)
Details:
  1. 12
  2. 18
  3. 0
  4. 96

The particular beta distribution in the last exercise is also known as the (standard) arcsine distribution. It governs the last time that the Brownian motion process hits 0 during the time interval [0,1].

Open the Brownian motion experiment and select the last zero. Note the shape of the probability density function in relation to the moment results in the last exercise. Run the simulation 1000 times and compare the empirical density function to the probability density function.

Counterexamples

The following exercise gives a simple example of a discrete distribution that is not symmetric but has skewness 0.

Suppose that X is a discrete random variable with probability density function f given by f(3)=110, f(1)=12, f(2)=25. Find each of the following and then show that the distribution of X is not symmetric.

  1. E(X)
  2. var(X)
  3. skew(X)
  4. kurt(X)
Details:
  1. 0
  2. 3
  3. 0
  4. 53

The PDF f is clearly not symmetric about 0, and the mean is the only possible point of symmetry.

The following exercise gives a more complicated continuous distribution that is not symmetric but has skewness 0. It is one of a collection of distributions constructed by Erik Meijer.

Suppose that U, V, and I are independent random variables, and that U is normally distributed with mean μ=2 and variance σ2=1, V is normally distributed with mean ν=1 and variance τ2=2, and I is an indicator variable with P(I=1)=p=13. Let X=IU+(1I)V. Find each of the following and then show that the distribution of X is not symmetric.

  1. E(X)
  2. var(X)
  3. skew(X)
  4. kurt(X)
Details:

The distribution of X is a mixture of normal distributions. The PDF is f=pg+(1p)h where g is the normal PDF of U and h is the normal PDF of V. However, it's best to work with the random variables. For nN+, note that In=I and (1I)n=1I and note also that the random variable I(1I) just takes the value 0. It follows that Xn=IUn+(1I)Vn,nN+ So now, using standard results for the normal distribution,

  1. E(X)=pμ+(1p)ν=0.
  2. var(X)=E(X2)=p(σ2+μ2)+(1p)(τ2+ν2)=113
  3. E(X3)=p(3μσ2+μ3)+(1p)(3ντ2+ν3)=0 so skew(X)=0
  4. E(X4)=p(3σ4+6σ2μ2+μ4)+(1p)(3τ4+6τ2ν2+ν4)=31 so kurt(X)=2791212.306

The graph of the PDF f of X is given below. Note that f is not symmetric about 0. (Again, the mean is the only possible point of symmetry.)

The PDF of X
PDF