1. Random
  2. 4. Special Distributions
  3. The Normal Distribution

The Normal Distribution

The normal distribution holds an honored role in probability and statistics, mostly because of the central limit theorem, one of the fundamental theorems that forms a bridge between the two subjects. In addition, as we will see, the normal distribution has many nice mathematical properties. The normal distribution is also called the Gaussian distribution, in honor of Carl Friedrich Gauss, who was among the first to use the distribution.

The Standard Normal Distribution

Distribution Functions

The standard normal distribution is a continuous distribution on R with probability density function ϕ given by ϕ(z)=12πez2/2,zR

Details:

Let c=ez2/2dz. We need to show that c=2π. That is, 2π is the normalzing constant for the function zez2/2. The proof uses a nice trick: c2=ex2/2dxey2/2dy=R2e(x2+y2)/2d(x,y) We now convert the double integral to polar coordinates: x=rcosθ, y=rsinθ where r[0,) and θ[0,2π). So, x2+y2=r2 and d(x,y)=rd(r,θ). Thus, converting back to iterated integrals, c2=02π0rer2/2drdθ Substituting u=r2/2 in the inner integral gives 0eudu=1 and then the outer integral is 02π1dθ=2π. Thus, c2=2π and so c=2π.

The standard normal probability density function has the famous bell shape that is known to just about everyone.

The standard normal density function ϕ satisfies the following properties:

  1. ϕ is symmetric about z=0.
  2. ϕ increases and then decreases, with mode z=0.
  3. ϕ is concave upward and then downward and then upward again, with inflection points at z=±1.
  4. ϕ(z)0 as z and as z.
Details:

These results follow from standard calculus. Note that ϕ(z)=zϕ(z) (which gives (b)) and hence also ϕ(z)=(z21)ϕ(z) (which gives (c)).

In the Special Distribution Simulator, select the normal distribution and keep the default settings. Note the shape and location of the standard normal density function. Run the simulation 1000 times, and compare the empirical density function to the probability density function.

The standard normal distribution function Φ, given by Φ(z)=zϕ(t)dt=z12πet2/2dt and its inverse, the quantile function Φ1, cannot be expressed in closed form in terms of elementary functions. However approximate values of these functions can be obtained from the quantile app, and from most mathematics and statistics software. Indeed these functions are so important that they are considered special functions of mathematics.

The standard normal distribution function Φ satisfies the following properties:

  1. Φ(z)=1Φ(z) for zR
  2. Φ1(p)=Φ1(1p) for p(0,1)
  3. Φ(0)=12, so the median is 0.
Details:

Part (a) follows from the symmetry of ϕ. Part (b) follows from part (a). Part (c) follows from part (a) with z=0.

In the quantile app, select the normal distribution and keep the default settings.

  1. Note the shape of the density function and the distribution function.
  2. Find the first and third quartiles.
  3. Compute the interquartile range.

In the quantile app, select the normal distribution and keep the default settings. Find the quantiles of the following orders for the standard normal distribution:

  1. q=0.005, q=0.995
  2. q=0.05, q=0.95
  3. q=0.1, q=0.9

Moments

Suppose that random variable Z has the standard normal distribution.

The mean and variance of Z are

  1. E(Z)=0
  2. var(Z)=1
Details:
  1. Of course, by symmetry, if Z has a mean, the mean must be 0, but we have to argue that the mean exists. Actually it's not hard to compute the mean directly. Note that E(Z)=z12πez2/2dz=0z12πez2/2dz+0z12πez2/2dz The integrals on the right can be evaluated explicitly using the simple substitution u=z2/2. The result is E(Z)=1/2π+1/2π=0.
  2. By part (a), note that var(Z)=E(Z2)=z2ϕ(z)dz Integrate by parts, using the parts u=z and dv=zϕ(z)dz. Thus du=dz and v=ϕ(z). Note that zϕ(z)0 as z and as z. Thus, the integration by parts formula gives var(Z)=ϕ(z)dz=1.

In the Special Distribution Simulator, select the normal distribution and keep the default settings. Note the shape and size of the mean ± standard deviation bar.. Run the simulation 1000 times, and compare the empirical mean and standard deviation to the true mean and standard deviation.

More generally, we can compute all of the moments. The key is the following recursion formula.

For nN+, E(Zn+1)=nE(Zn1)

Details:

First we use the differential equation in the proof of [2] above, namely ϕ(z)=zϕ(z). E(Zn+1)=zn+1ϕ(z)dz=znzϕ(z)dz=znϕ(z)dz Now we integrate by parts, with u=zn and dv=ϕ(z)dz to get E(Zn+1)=znϕ(z)|+nzn1ϕ(z)dz=0+nE(Zn1)

The moments of the standard normal distribution are now easy to compute.

For nN,

  1. E(Z2n+1)=0
  2. E(Z2n)=13(2n1)=(2n)!/(n!2n)
Details:

The result follows from the mean and variance in [7] and recursion relation in [9].

  1. Since E(Z)=0 it follows that E(Zn)=0 for every odd nN.
  2. Since E(Z2)=1, it follows that E(Z4)=13 and then E(Z6)=135, and so forth. You can use induction, if you like, for a more formal proof.

Of course, the fact that the odd-order moments are 0 also follows from the symmetry of the distribution. The following theorem gives the skewness and kurtosis of the standard normal distribution.

The skewness and kurtosis of Z are

  1. skew(Z)=0
  2. kurt(Z)=3
Details:
  1. This follows immediately from the symmetry of the distribution. Directly, since Z has mean 0 and variance 1, skew(Z)=E(Z3)=0.
  2. Since E(Z)=0 and var(Z)=1, kurt(Z)=E(Z4)=3.

Because of the last result, (and the use of the standard normal distribution literally as a standard), the excess kurtosis of a random variable is defined to be the ordinary kurtosis minus 3. Thus, the excess kurtosis of the normal distribution is 0.

Many other important properties of the normal distribution are most easily obtained using the moment generating function or the characteristic function.

The moment generating function m and characteristic function χ of Z are given by

  1. m(t)=et2/2 for tR.
  2. χ(t)=et2/2 for tR.
Details:
  1. Note that m(t)=E(etZ)=etz12πez2/2dz=12πexp(12z2+tz)dz We complete the square in z to get 12z2+tz=12(zt)2+12. Thus we have E(etZ)=e12t212πexp[12(zt)2]dz In the integral, if we use the simple substitution u=zt then the integral becomes ϕ(u)du=1. Hence E(etZ)=e12t2,
  2. This follows from (a) since χ(t)=m(it).

Thus, the standard normal distribution has the curious property that the characteristic function is a multiple of the probability density function: χ=2πϕ The moment generating function can be used to give another derivation of the moments of Z, since we know that E(Zn)=m(n)(0).

The General Normal Distribution

The general normal distribution is the location-scale family associated with the standard normal distribution.

Suppose that μR and σ(0,) and that Z has the standard normal distribution. Then X=μ+σZ has the normal distribution with location parameter μ and scale parameter σ.

Distribution Functions

Suppose that X has the normal distribution with location parameter μR and scale parameter σ(0,). The basic properties of the density function and distribution function of X follow from general results for location scale families.

The probability density function f of X is given by f(x)=1σϕ(xμσ)=12πσexp[12(xμσ)2],xR

Details:

This follows from the change of variables formula corresponding to the transformation x=μ+σz.

The probability density function f satisfies the following properties:

  1. f is symmetric about x=μ.
  2. f increases and then decreases with mode x=μ.
  3. f is concave upward then downward then upward again, with inflection points at x=μ±σ.
  4. f(x)0 as x and as x.
Details:

These properties follow from the corresponding properties of ϕ in [2].

In the special distribution simulator, select the normal distribution. Vary the parameters and note the shape and location of the probability density function. With your choice of parameter settings, run the simulation 1000 times and compare the empirical density function to the true probability density function.

Let F denote the distribution function of X, and as above, let Φ denote the standard normal distribution function.

The distribution function F and quantile function F1 satsify the following properties:

  1. F(x)=Φ(xμσ) for xR.
  2. F1(p)=μ+σΦ1(p) for p(0,1).
  3. F(μ)=12 so the median occurs at x=μ.
Details:

Part (a) follows since X=μ+σZ. Parts (b) and (c) follow from (a).

In the quantile app, select the normal distribution. Vary the parameters and note the shape of the density function and the distribution function.

Moments

Suppose again that X has the normal distribution with location parameter μR and scale parameter σ(0,). As the notation suggests, the location and scale parameters are also the mean and standard deviation, respectively.

The mean and variance of X are

  1. E(X)=μ
  2. var(X)=σ2
Details:

This follows from the representation X=μ+σZ and basic properties of expected value and variance.

So the parameters of the normal distribution are usually referred to as the mean and standard deviation rather than location and scale. The central moments of X can be computed easily from the moments of the standard normal distribution. The ordinary (raw) moments of X can be computed from the central moments, but the formulas are a bit messy.

For nN,

  1. E[(Xμ)2n]=13(2n1)σ2n=(2n)!σ2n/(n!2n)
  2. E[(Xμ)2n+1]=0

All of the odd central moments of X are 0, a fact that also follows from the symmetry of the probability density function.

In the special distribution simulator select the normal distribution. Vary the mean and standard deviation and note the size and location of the mean ± standard deviation bar. With your choice of parameter settings, run the simulation 1000 times and compare the empirical mean and standard deviation to the true mean and standard deviation.

The following result gives the skewness and kurtosis.

The skewness and kurtosis of X are

  1. skew(X)=0
  2. kurt(X)=3
Details:

The skewness and kurtosis of a variable are defined in terms of the standard score, so these results follows from the corresponding result for Z in [11].

The moment generating function M and characteristic function χ of X are given by

  1. M(t)=exp(μt+12σ2t2) for tR.
  2. χ(t)=exp(iμt12σ2t2) for tR
Details:
  1. This follows from the representation X=μ+σZ, basic properties of expected value, and the MGF of Z in [12]: E(etX)=E(etμ+tσZ)=etμE(etσZ)=etμe12t2σ2=etμ+12σ2t2
  2. This follows from (a) since χ(t)=M(it).

Related Distributions

The normal family of distributions satisfies two very important properties: invariance under linear transformations of the variable and invariance with respect to sums of independent variables. The first property is essentially a restatement of the fact that the normal distribution is a location-scale family.

Suppose that X is normally distributed with mean μ and variance σ2. If aR and bR{0}, then a+bX is normally distributed with mean a+bμ and variance b2σ2.

Details:

The MGF of a+bX is E[et(a+bX)]=etaE[e(tb)X]=etaeμ(tb)+σ2(tb)2/2=e(a+bμ)t+b2σ2t2/2 which we recognize as the MGF of the normal distribution with mean a+bμ and variance b2σ2.

Recall that in general, if X is a random variable with mean μ and standard deviation σ>0, then Z=(Xμ)/σ is the standard score of X. A corollary of the last result is that if X has a normal distribution then the standard score Z has a standard normal distribution. Conversely, any normally distributed variable can be constructed from a standard normal variable.

Standard score.

  1. If X has the normal distribution with mean μ and standard deviation σ then Z=Xμσ has the standard normal distribution.
  2. If Z has the standard normal distribution and if μR and σ(0,), then X=μ+σZ has the normal distribution with mean μ and standard deviation σ.

Suppose that X1 and X2 are independent random variables, and that Xi is normally distributed with mean μi and variance σi2 for i{1,2}. Then X1+X2 is normally distributed with

  1. E(X1+X2)=μ1+μ2
  2. var(X1+X2)=σ12+σ22
Details:

The MGF of X1+X2 is the product of the MGFs, so E(exp[t(X1+X2)])=exp(μ1t+σ12t2/2)exp(μ2t+σ22t2/2)=exp[(μ1+μ2)t+(σ12+σ22)t2/2] which we recognize as the MGF of the normal distribution with mean μ1+μ2 and variance σ12+σ22.

Theorem [26] generalizes to a sum of n independent, normal variables. The important part is that the sum is still normal; the expressions for the mean and variance are standard results that hold for the sum of independent variables generally. As a consequence of this result and the one for linear transformations [24], it follows that the normal distribution is stable.

The normal distribution is stable. Specifically, suppose that X has the normal distribution with mean μR and variance σ2(0,). If (X1,X2,,Xn) are independent copies of X, then X1+X2++Xn has the same distribution as (nn)μ+nX, namely normal with mean nμ and variance nσ2.

Details:

By [26] X1+X2++Xn has the normal distribution with mean nμ and variance nσ2. By [24], (nn)μ+nX has the normal distribution with mean (nn)μ+nμ=nμ and variance (n)2σ2=nσ2.

All stable distributions are infinitely divisible, so the normal distribution belongs to this family as well. For completeness, here is the explicit statement:

The normal distribution is infinitely divisible. Specifically, if X has the normal distribution with mean μR and variance σ2(0,), then for nN+, X has the same distribution as X1+X2++Xn where (X1,X2,,Xn) are independent, and each has the normal distribution with mean μ/n and variance σ2/n.

Finally, the normal distribution belongs to the family of general exponential distributions.

The normal distribution with mean μ and variance σ2 is a two-parameter exponential family with natural parameters (μσ2,12σ2), and natural statistics (X,X2).

Details:

Expanding the square, the normal PDF can be written in the form f(x)=12πσexp(μ22σ2)exp(μσ2x12σ2x2),xR so the result follows from the definition of the general exponential family.

A number of other special distributions studied in this chapter are constructed from normally distributed variables. These include

Also, as mentioned at the beginning of this section, the importance of the normal distribution stems in large part from the central limit theorem, one of the fundamental theorems of probability. By virtue of this theorem, the normal distribution is connected to many other distributions, by means of limits and approximations, including the special distributions in the following list. Details are given in the individual sections.

Computational Exercises

Suppose that the volume of beer in a bottle of a certain brand is normally distributed with mean 0.5 liter and standard deviation 0.01 liter.

  1. Find the probability that a bottle will contain at least 0.48 liter.
  2. Find the volume that corresponds to the 95th percentile
Details:

Let X denote the volume of beer in liters

  1. P(X>0.48)=0.9772
  2. x0.95=0.51645

A metal rod is designed to fit into a circular hole on a certain assembly. The radius of the rod is normally distributed with mean 1 cm and standard deviation 0.002 cm. The radius of the hole is normally distributed with mean 1.01 cm and standard deviation 0.003 cm. The machining processes that produce the rod and the hole are independent. Find the probability that the rod is to big for the hole.

Details:

Let X denote the radius of the rod and Y the radius of the hole. P(YX<0)=0.0028

The weight of a peach from a certain orchard is normally distributed with mean 8 ounces and standard deviation 1 ounce. Find the probability that the combined weight of 5 peaches exceeds 45 ounces.

Details:

Let X denote the combined weight of the 5 peaches, in ounces. P(X>45)=0.0127

A Further Generlization

In some settings, it's convenient to consider a constant as having a normal distribution (with mean being the constant and variance 0, of course). This convention simplifies the statements of theorems and definitions in these settings. Of course, the formulas for the probability density function [14] and the distribution function [17] do not hold for a constant, but the other results involving the moment generating function [23], linear transformations [24], and sums [26] are still valid. Moreover, the result for linear transformations [24] would hold for all a and b.