Initially, we defined expected value separately for discrete distributions, continuous distributions, and mixed distributions, in each case using density functions. Later we showed how these definitions can be unified, by first defining expected value for nonnegative random variables in terms of the right-tail distribution function. However, by far the best and most elegant definition of expected value is as an integral with respect to the underlying probability measure. This definition and a review of the properties of expected value are the goals of this section. No proofs are necessary (you will be happy to know), since all of the results follow from the general theory of integration. If you are a new student of probability, or are not interested in the measure-theoretic detail of the subject, you can safely skip this section.
Definitions
As usual, our starting point is a random experiment, as described in random experiment, modeled by a probability space . So is the set of outcomes, is the -algebra of events, and is the probability measure on the sample space .
Recall that a random variable for the experiment is simply a measurable function from into another measurable space . When , we assume that is Lebesgue measurable, and we take to the -algebra of Lebesgue measurable subsets of . As noted above, here is the measure-theoretic definition:
If is a real-valued random variable on the probability space, the expected value of is defined as the integral of with respect to , assuming that the integral exists:
Let's review how the integral is defined in stages, but now using the notation of probability theory.
Let denote the support set of , so that is a measurable subset of .
- If is finite, then .
- If , then
- For general , as long as the right side is not of the form , and where and denote the positive and negative parts of .
- If , then , assuming that the expected value on the right exists.
Thus, as with integrals generally, an expected value can exist as a number in (in which case is integrable), can exist as or , or can fail to exist. In reference to part (a), a random variable with a finite set of values in is a simple function in the terminology of general integration. In reference to part (b), note that the expected value of a nonnegative random variable always exists in . In reference to part (c), exists if and only if either or .
Our next goal is to restate the basic theorems and properties of integrals, but in the notation of probability. Unless otherwise noted, all random variables are assumed to be real-valued.
Basic Properties
The Linear Properties
Perhaps the most important and basic properties are the linear properties. Part (a) is the additive property and part (b) is the scaling property.
Suppose that and are random variables whose expected values exist, and that . Then
- as long as the right side is not of the form .
Thus, part (a) holds if at least one of the expected values on the right is finite, or if both are , or if both are . What is ruled out are the two cases where one expected value is and the other is , and this is what is meant by the indeterminate form .
Equality and Order
Our next set of properties deal with equality and order. First, the expected value of a random variable over a null set is 0.
If is a random variable and is an event with . Then .
Random variables that are equivalent have the same expected value
If is a random variable whose expected value exists, and is a random variable with , then .
Our next result is the positive property of expected value.
Suppose that is a random variable and . Then
- if and only if .
So, if is a nonnegative random variable then if and only if . The next result is the increasing property of expected value, perhaps the most important property after linearity.
Suppose that are random variables whose expected values exist, and that . Then
- Except in the case that both expected values are or both , if and only if .
So if with probability 1 then, except in the two cases mentioned, if and only if . The next result is the absolute value inequality.
Suppose that is a random variable whose expected value exists. Then
- If is finite, then equality holds in (a) if and only if or .
Change of Variables and Density Functions
The Change of Variables Theorem
Suppose now that is a general random variable on the probability space , taking values in a measurable space . Recall that the probability distribution of is the probability measure on given by for . This is a special case of a new positive measure induced by a given positive measure and a measurable function. If is measurable, then is a real-valued random variable. The following result shows how to computed the expected value of as an integral with respect to the distribution of , and is known as the change of variables theorem.
If is measurable then, assuming that the expected value exists,
So, using the original definition and the change of variables theorem, and giving the variables explicitly for emphasis, we have
The Radon-Nikodym Theorem
Suppose now is a positive measure on , and that the distribution of is absolutely continuous with respect to . Recall that this means that implies for . By the Radon-Nikodym theorem, named for Johann Radon and Otto Nikodym, has a probability density function with respect to . That is,
In this case, we can write the expected value of as an integral with respect to the probability density function.
If is measurable then, assuming that the expected value exists,
Again, giving the variables explicitly for emphasis, we have the following chain of integrals:
There are two critically important special cases.
Discrete Distributions
Suppose first that is a discrete measure space, so that is countable, is the collection of all subsets of , and is counting measure on . Thus, has a discrete distribution on , and this distribution is always absolutely continuous with respect to . Specifically, if and only if and of course . The probability density function of with respect to , as we know, is simply for . Moreover, integrals with respect to are sums, so
assuming that the expected value exists. Existence in this case means that either the sum of the positive terms is finite or the sum of the negative terms is finite, so that the sum makes sense (and in particular does not depend on the order in which the terms are added). Specializing further, if itself is real-valued and we have
which was our original definition of expected value in the discrete case.
Continuous Distributions
For the second special case, suppose that is a Euclidean measure space, so that is a Lebesgue measurable subset of for some , is the -algebra of Lebesgue measurable subsets of , and is Lebesgue measure on . The distribution of is absolutely continuous with respect to if implies for . If this is the case, then a probability density function of has its usual meaning. Thus,
assuming that the expected value exists. When is a typically nice function, this integral reduces to an ordinary -dimensional Riemann integral of calculus. Specializing further, if is itself real-valued and then
which was our original definition of expected value in the continuous case.
Interchange Properties
In this subsection, we review properties that allow the interchange of expected value and other operations: limits of sequences, infinite sums, and integrals. We assume again that the random variables are real-valued unless otherwise specified.
Limits
Our first set of convergence results deals with the interchange of expected value and limits. We start with the expected value version of Fatou's lemma, named in honor of Pierre Fatou. Its usefulness stems from the fact that no assumptions are placed on the random variables, except that they be nonnegative.
Suppose that is a nonnegative random variable for . Then
Our next set of results gives conditions for the interchange of expected value and limits.
Suppose that is a random variable for each . then
in each of the following cases:
- is nonnegative for each and is increasing in .
- exists for each , , and is increasing in .
- exists for each , , and is decreasing in .
- exists, and for where is a nonnegative random variable with .
- exists, and for where is a positive constant.
Statements about the random variables in [11] (nonnegative, increasing, existence of limit, etc.) need only hold with probability 1. Part (a) is the monotone convergence theorem, one of the most important convergence results and in a sense, essential to the definition of the integral in the first place. Parts (b) and (c) are slight generalizations of the monotone convergence theorem. In parts (a), (b), and (c), note that exists (with probability 1), although the limit may be in parts (a) and (b) and in part (c) (with positive probability). Part (d) is the dominated convergence theorem, another of the most important convergence results. It's sometimes also known as Lebesgue's dominated convergence theorem in honor of Henri Lebesgue. Part (e) is a corollary of the dominated convergence theorem, and is known as the bounded convergence theorem.
Infinite Series
Our next results involve the interchange of expected value and an infinite sum, so these results generalize the basic additivity property of expected value.
Suppose that is a random variable for . Then
in each of the following cases:
- is nonnegative for each .
Part (a) is a consequence of the monotone convergence theorem, and part (b) is a consequence of the dominated convergence theorem. In (b), note that and hence is absolutely convergent with probability 1. Our next result is the additivity of the expected value over a countably infinite collection of disjoint events.
Suppose that is a random variable whose expected value exists, and that is a disjoint collection events. Let . Then
Of course, the previous theorem applies in particular if is nonnegative.
Integrals
Suppose that is a -finite measure space, and that is a real-valued random variable for each . Thus we can think of is a stochastic process indexed by . We assume that is measurable, as a function from the product space into . Our next result involves the interchange of expected value and integral, and is a consequence of Fubini's theorem, named for Guido Fubini.
Under the assumptions above,
in each of the following cases:
- is nonnegative for each .
-
Fubini's theorem actually states that the two iterated integrals above equal the joint integral
where of course, is the product measure on . However, our interest is usually in evaluating the iterated integral above on the left in terms of the iterated integral on the right. Part (a) is the expected value version of Tonelli's theorem, named for Leonida Tonelli.
Examples and Exercises
You may have worked some of the computational exercises before, but try to see them in a new light, in terms of the general theory of integration.
The Cauchy Distribution
Recall that the Cauchy distribution, named for Augustin Cauchy, is a continuous distribution with probability density function given by
Suppose that has the Cauchy distribution.
- Show that does not exist.
- Find
Details:
Open the Cauchy Experiment and keep the default parameters. Run the experiment 1000 times and note the behaior of the sample mean.
The Pareto Distribution
Recall that the Pareto distribution, named for Vilfredo Pareto, is a continuous distribution with probability density function given by
where is the shape parameter.
Suppose that has the Pareto distribution with shape parameter . Find is the following cases:
Answer
Open the special distribution simulator and select the Pareto distribution. Vary the shape parameter and note the shape of the probability density function and the location of the mean. For various values of the parameter, run the experiment 1000 times and compare the sample mean with the distribution mean.
Suppose that has the Pareto distribution with shape parameter . Find for .
Details:
Special Results for Nonnegative Variables
For a nonnegative variable, the moments can be obtained from integrals of the right-tail distribution function.
If is a nonnegative random variable then
Details:
By Fubini's theorem [15] we can interchange an expected value and integral when the integrand is nonnegative. Hence
When we have . We saw this result before, but now we can understand the proof in terms of Fubini's theorem.
For a random variable taking nonnegative integer values, the moments can be computed from sums involving the right-tail distribution function.
Suppose that has a discrete distribution, taking values in . Then
Details:
By the theorem in [13], we can interchange expected value and infinite series when the terms are nonnegative. Hence
When we have . We also saw this result before, but now we can understand the proof in terms of the interchange of sum and expected value.