For our basic ingredients, we start with a stochastic process on an underlying probability space , having state space , and where the index set (representing time) is either (discrete time) or (continuous time). So to review what all this means, is the sample space, the -algebra of events, the probability measure on , and is a random variable with values in for each . Next, we have a filtration, and we assume that is adapted to . To review again, is an increasing family of sub -algebras of , so that for with , and is measurable with respect to for . We think of as the collection of events up to time , thus encoding the information available at time . Finally, we assume that , so that the mean of exists as a real number, for each .
There are two important special cases of the basic setup. The simplest case, of course, is when for , so that is the natural filtration associated with . Another case that arises frequently is when we have a second stochastic process on with values in a general measure space , and is the natural filtration associated with . So in this case, our main assumption is that is measurable with respect to for .
The theory of martingales is beautiful, elegant, and mostly accessible in discrete time, when . But as with the theory of Markov processes, martingale theory is technically much more complicated in continuous time, when . In this case, additional assumptions about the continuity of the sample paths and the filtration are often necessary in order to have a nice theory. Specifically, we will assume that the process is right continuous and has left limits, and that the filtration is right continuous and complete. These are the standard assumptions in continuous time.
Definitions
For the basic definitions that follow, you may need to review conditional expected value with respect to a -algebra.
The process is a martingale with respect to if for all with .
In the special case that is the natural filtration associated with , we simply say that is a martingale, without reference to the filtration. In the special case that we have a second stochastic process and is the natural filtration associated with , we say that is a martingale with respect to .
The term martingale originally referred to a portion of the harness of a horse, and was later used to describe gambling strategies, such as the one used in the Petersburg paradox, in which bets are doubled when a game is lost. To interpret the definitions above in terms of gambling, suppose that a gambler is at a casino, and that represents her fortune at time and the information available to her at time . Suppose now that with and that we think of as the current time, so that is a future time. If is a martingale with respect to then the games are fair in the sense that the gambler's expected fortune at the future time is the same as her current fortune at time . To venture a bit from the casino, suppose that is the price of a stock, or the value of a stock index, at time . If is a martingale, then the expected value at a future time, given all of our information, is the present value.
An English-style breastplate with a running martingale attachement. By Danielle M., CC BY 3.0, from Wikipedia
But as we will see, martingales are useful in probability far beyond the application to gambling and even far beyond financial applications generally. Indeed, martingales are of fundamental importance in modern probability theory. Here are two related definitions, with equality in the martingale condition replaced by inequalities.
Suppose again that the process and the filtration satisfy the basic assumptions in above.
is a sub-martingale with respect to if for all with .
is a super-martingale with respect to if for all with .
In the gambling setting, a sub-martingale models games that are favorable to the gambler on average, while a super-martingale models games that are unfavorable to the gambler on average. To venture again from the casino, suppose that is the price of a stock, or the value of a stock index, at time . If is a sub-martingale, the expected value at a future time, given all of our information, is greater than the present value, and if is a super-martingale then the expected value at the future time is less than the present value. One hopes that a stock index is a sub-martingale.
Clearly is a martingale with respect to if and only if it is both a sub-martingale and a super-martingale. Finally, recall that the conditional expected value of a random variable with respect to a -algebra is itself a random variable, and so the equations and inequalities in the definitions should be interpreted as holding with probability 1. In this section generally, statements involving random variables are assumed to hold with probability 1.
The conditions that define martingale, sub-martingale, and super-martingale make sense if the index set is any totally ordered set. In some applications that we will consider later, for fixed . In the section on backwards martingales, or . In the case of discrete time when , we can simplify the definitions slightly.
Suppose that satisfies the basic assumptions above.
is a martingale with respect to if and only if for all .
is a sub-martingale with respect to if and only if for all .
is a super-martingale with respect to if and only if for all .
Details:
The conditions in the definitions clearly imply the conditions here, so we just need to show the opposite implications. Thus, assume that the condition in (a) holds and suppose that with . Then so and hence
Repeating the argument, we get to
The proof for sub and super-martingales is analogous, with inequalities replacing the last equality.
The relations that define martingales, sub-martingales, and super-martingales hold for the ordinary (unconditional) expected values.
Suppose that with .
If is a martingale with respect to then .
If is a sub-martingale with respect to then .
If is a super-martingale with respect to then .
Details:
The results follow directly from the definitions, and the critical fact that for .
So if is a martingale then has constant expected value, and this value is referred to as the mean of .
Examples
The goal for the remainder of this section is to give some classical examples of martingales, and by doing so, to show the wide variety of applications in which martingales occur. We will return to many of these examples in subsequent sections. Without further ado, we assume that all random variables are real-valued, unless otherwise specified, and that all expected values mentioned below exist in . Be sure to try the proofs yourself before expanding the details.
Constant Sequence
Our first example is rather trivial, but still worth noting.
Suppose that is a filtration on the probability space and that is a random variable that is measurable with respect to and satisfies . Let for . Them is a martingale with respect to .
Details:
Since is measurable with respect to , it is measurable with respect to for all . Hence is adapted to . If with , then .
Partial Sums
For our next discussion, we start with one of the most basic martingales in discrete time, and the one with the simplest interpretation in terms of gambling. Suppose that is a sequence of independent random variables with for . Let
so that is simply the partial sum process associated with .
For the partial sum process ,
If for then is a sub-martingale.
If for then is a super-martingale.
If for then is a martingale.
Details:
Let for . Note first that
Next,
The last equality holds since is measurable with respect to and is independent of . The results now follow from the definitions.
In terms of gambling, if is the gambler's initial fortune and is the gambler's net winnings on the th game, then is the gamblers net fortune after games for . But partial sum processes associated with independent sequences are important far beyond gambling. In fact, much of classical probability deals with partial sums of independent and identically distributed variables. The entire chapter on Random Samples explores this setting.
Note that . Hence condition (a) is equivalent to increasing, condition (b) is equivalent to decreasing, and condition (c) is equivalent to constant. Here is another martingale associated with the partial sum process, known as the second moment martingale.
Suppose that for and for . Let
Then is a martingale with respect to .
Details:
Again, let for . Since the sequence is independent, note that
Also, since for . In particular, for . Next for ,
since is measurable with respect to and is independent of . But and . Hence we have for .
So under the assumptions in this theorem, both and are martingales. We will generalize the results for partial sum processes below in the discussion on processes with independent increments in .
Martingale Difference Sequences
In the last discussion, we saw that the partial sum process associated with a sequence of independent, mean 0 variables is a martingale. Conversely, every martingale in discrete time can be written as a partial sum process of uncorrelated mean 0 variables. This representation gives some significant insight into the theory of martingales generally. Suppose that is a martingale with respect to the filtration .
Let and for . The process is the martingale difference sequence associated with and
As promised, the martingale difference variables have mean 0, and in fact satisfy a stronger property.
Suppose that is the martingale difference sequence associated with . Then
is adapted to .
for with .
for
Details:
Of course is measurable with respect to . For , and , and hence are measurable with respect to
Let . By the martingale and adapted properties,
Next by the tower property,
Continuing (or using induction) gives the general result.
Since is a martingale, it has constant mean, as noted above. Hence for . We could also use part (b).
Also as promised, if the martingale variables have finite variance, then the martingale difference variables are uncorrelated.
Suppose again that is the martingale difference sequence associated with the martingale . Assume that for . Then is an uncorrelated sequence. Moreover,
Details:
Let with . To show that and are uncorrelated, we just need to show that (since ). But by the previous result,
Finally, the variance of a sum of uncorrelated variables is the sum of the variances. Since has mean 0, for . Hence the formula for holds.
We now know that a discrete-time martingale is the partial sum process associated with a sequence of uncorrelated variables. Hence we might hope that there are martingale versions of the fundamental theorems that hold for a partial sum process associated with an independent sequence. This turns out to be true, and is a basic reason for the importance of martingales.
Discrete-Time Random Walks
Suppose that is a sequence of independent random variables with identically distributed. We assume that for and we let denote the common mean of . Let be the partial sum process associated with so that
This setting is a special case of the more general partial sum process considered in . The process is sometimes called a (discrete-time) random walk. The initial position of the walker can have an arbitrary distribution, but then the steps that the walker takes are independent and identically distributed. In terms of gambling, is the initial fortune of the gambler playing a sequence of independent and identical games. If is the amount won (or lost) on game , then is the gambler's net fortune after games.
For the random walk ,
is a martingale if .
is a sub-martingale if .
is a super-martingale if
For the second moment martingale, suppose that has common mean and common variance for , and that .
Let for . Then is a martingale with respect to .
Details:
This follows from the corresponding result for a general partial sum process, above, since
Our next discussion is similar to the one on partial sum processes in , but with products instead of sums. So suppose that is an independent sequence of nonnegative random variables with for . Let
so that is the partial product process associated with .
For the partial product process ,
If for then is a martingale with respect to
If for then is a sub-martingale with respect to
If for then is a super-martingale with respect to
Details:
Let for . Since the variables are independent,
Next,
since is measurable with respect to and is independent of . The results now follow from the definitions.
As with random walks, a special case of interest is when is an identically distributed sequence.
The Simple Random Walk
Suppose now that that is a sequence of independent random variables with and for , where . Let be the partial sum process associated with so that
Then is the simple random walk with parameter , and of course, is a special case of the more general random walk studied in . In terms of gambling, our gambler plays a sequence of independent and identical games, and on each game, wins €1 with probability and loses €1 with probability . So if is the gambler's initial fortune, then is her net fortune after games.
So case (a) corresponds to favorable games, case (b) to unfavorable games, and case (c) to fair games.
Open the simulation of the simple symmetric random. For various values of the number of trials , run the simulation 1000 times and note the general behavior of the sample paths.
Here is the second moment martingale for the simple, symmetric random walk.
Consider the simple random walk with parameter , and let for . Then is a martingale with respect to
Details:
Note that and for each , so the result follows from the general result above.
But there is another martingale that can be associated with the simple random walk, known as De Moivre's martingale and named for one of the early pioneers of probability theory, Abraham De Moivre.
Recall that the beta-Bernoulli process is constructed by randomizing the success parameter in a Bernoulli trials process with a beta distribution. Specifically we have a random variable that has the beta distribution with parameters , and a sequence of indicator variables such that given , is a sequence of independent variables with for . As usual, we couch this in reliability terms, so that means success on trial and means failure. In our study of this process, we showed that the finite-dimensional distributions are given by
where we use the ascending power notation for and . Next, let denote the partial sum process associated with , so that once again,
Of course is the number of success in the first trials and has the beta-binomial distribution defined by
Now let
This variable also arises naturally. Let for . Then as shown in the section on the beta-Bernoulli process, . In statistical terms, the second equation means that is the Bayesian estimator of the unknown success probability in a sequence of Bernoulli trials, when is modeled by the random variable .
is a martingale with respect to .
Details:
Note that so for . Next,
As noted above, . Substituting into the displayed equation above and doing a bit of algebra we have
Open the beta-Binomial experiment. Run the simulation 1000 times for various values of the parameters, and compare the empirical probability density function with the true probability density function.
Pólya's Urn Process
Recall that in the simplest version of Pólya's urn process, we start with an urn containing red and green balls. At each discrete time step, we select a ball at random from the urn and then replace the ball and add new balls of the same color to the urn. For the parameters, we need and . For , let denote the color of the ball selected on the th draw, where 1 means red and 0 means green. The process is a classical example of a sequence of exchangeable yet dependent variables. Let denote the partial sum process associated with , so that once again,
Of course is the total number of red balls selected in the first draws. Hence at time , the total number of red balls in the urn is , while the total number of balls in the urn is and so the proportion of red balls in the urn is
is a martingale with respect to .
Details:
Indirect proof: If then for so is a constant martingale. If then is equivalent to the beta-Bernoulli process with parameters and . Moreover,
So is a martingale by [18].
Direct Proof: Trivially, so for . Let . For ,
since is measurable with respect to . But the probability of selecting a red ball on draw , given the history of the process up to time , is simply the proportion of red balls in the urn at time . That is,
Substituting and simplifying gives .
Open the simulation of Pólya's Urn Experiment. Run the simulation 1000 times for various values of the parameters, and compare the empirical probability density function of the number of red ball selected to the true probability density function.
Processes with Independent Increments.
Our first example above concerned the partial sum process in associated with a sequence of independent random variables . Such processes are the only ones in discrete time that have independent increments. That is, for with , is independent of . The random walk process in has the additional property of stationary increments. That is, the distribution of is the same as the distribution of for with . Let's consider processes in discrete or continuous time with these properties. Thus, suppose that satisfying the basic assumptions in above relative to the filtration . Here are the two definitions.
The process has
Independent increments if is independent of for all with .
Stationary increments if has the same distribution as for all .
Processes with stationary and independent increments were studied in the chapter on Markov processes. In continuous time (with the continuity assumptions we have imposed), such a process is known as a Lévy process, named for Paul Lévy, and also as a continuous-time random walk. For a process with independent increments (not necessarily stationary), the connection with martingales depends on the mean function given by for .
Suppose that has independent increments.
If is increasing then is a sub-martingale.
If is decreasing then is a super-martingale.
If is constant then is a martingale
Details:
The proof is just like the one above for partial sum processes. Suppose that with . Then
But is measurable with respect to and is independent of So
Compare this result with the corresponding result for the partial sum process in [6]. Suppose now that is a stochastic process as above, with mean function , and let for . The process is sometimes called the compensated process associated with and has mean function 0. If has independent increments, then clearly so does . Hence the following result is a trivial corollary to our previous theorem.
Suppose that has independent increments. The compensated process is a martingale.
Next we give the second moment martingale for a process with independent increments, generalizing the second moment martingale for a partial sum process in [7].
Suppose that has independent increments with constant mean function and and with for . Then is a martingale where
Details:
The proof is essentially the same as for the partial sum process in discrete time. Suppose that with . Note that . Next,
But is independent of , is measurable with respect to , and so
But also by independence and since has mean 0,
Putting the pieces together gives
Of course, since the mean function is constant, is also a martingale. For processes with independent and stationary increments (that is, random walks), the last two theorems simplify, because the mean and variance functions simplify.
Suppose that has stationary, independent increments, and let . Then
is a martingale if
is a sub-martingale if
is a super-martingale if
Details:
Recall that the mean function is given by for , so the result follows from the corresponding result in [23] for a process with independent increments.
Compare this result with the corresponding result in [11] for discrete-time random walks. Our next result is the second moment martingale. Compare this with the second moment martingale in [12] for discrete-time random walks.
Suppose that has stationary, independent increments with and . Then is a martingale where
Details:
Recall that if then has constant mean function. Also, , so the result follows from the corresponding result for a process with independent increments.
In discrete time, as we have mentioned several times, all of these results reduce to the earlier results for partial sum processes and random walks. In continuous time, the Poisson processes, named of course for Simeon Poisson, provides examples. The standard (homogeneous) Poisson counting process with constant rate has stationary, independent increments and mean function given by for . More generally, suppose that is piecewise continuous (and non-constant). The non-homogeneous Poisson counting process with rate function has independent increments and mean function given by
The increment has the Poisson distribution with parameter for with , so the process does not have stationary increments. In all cases, is increasing, so the following results are corollaries of our general results:
Let be the Poisson counting process with rate function . Then
is a sub-martingale
The compensated process is a martinagle.
Open the simulation of the Poisson counting experiment. For various values of and , run the experiment 1000 times and compare the empirical probability density function of the number of arrivals with the true probability density function.
We will see further examples of processes with stationary, independent increments in continuous time (and so also examples of continuous-time martingales) in our study of Brownian motion.
Likelihood Ratio Tests
Suppose that is a general measure space, and that is a sequence of independent, identically distributed random variables, taking values in . In statistical terms, corresponds to sampling from the common distribution, which is usually not completely known. Indeed, the central problem in statistics is to draw inferences about the distribution from observations of . Suppose now that the underlying distribution either has probability density function or probability density function , with respect to . We assume that and are positive on . Of course the common special cases of this setup are
is a measurable subset of for some and is -dimensional Lebesgue measure on .
The test is based on the test statistic
known as the likelihood ratio test statistic. Small values of the test statistic are evidence in favor of the alternative hypothesis . Here is our result.
Under the alternative hypothesis , the process is a martingale with respect to , known as the likelihood ratio martingale.
Details:
Let . For ,
Since is measurable with respect to and is independent of . But under , and using the change of variables formula for expected value, we have
This result also follows essentially from on partial products. The sequence given by for is independent and identically distributed, and as just shown, has mean 1 under .
Branching Processes
In the simplest model of a branching process, we have a system of particles each of which can die out or split into new particles of the same type. The fundamental assumption is that the particles act independently, each with the same offspring distribution on . We will let denote the (discrete) probability density function of the number of offspring of a particle, the mean of the distribution, and the probability generating function of the distribution. Thus, if is the number of children of a particle, then for , , and defined at least for .
Our interest is in generational time rather than absolute time: the original particles are in generation 0, and recursively, the children a particle in generation belong to generation . Thus, the stochastic process of interest is where is the number of particles in the th generation for . The process is a Markov chain and was studied in the section on discrete-time branching chains. In particular, one of the fundamental problems is to compute the probability of extinction starting with a single particle:
Then, since the particles act independently, the probability of extinction starting with particles is simply . We will assume that and . This is the interesting case, since it means that a particle has a positive probability of dying without children and a positive probability of producing more than 1 child. The fundamental result, you may recall, is that is the smallest fixed point of (so that ) in the interval . Here are two martingales associated with the branching process:
Each of the following is a martingale with respect to .
where for .
where for .
Details:
Let . For , note that can be written in the form
where is a sequence of independent variables, each with PDF (and hence mean and PGF ), and with independent of . Think of as the number of children of the th particle in generation .
For ,
For
Doob's Martingale
Our next example is one of the simplest, but most important. Indeed, as we will see later in the section on convergence, this type of martingale is almost universal in the sense that every uniformly integrable martingale is of this type. The process is constructed by conditioning a fixed random variable on the -algebras in a given filtration, and thus accumulating information about the random variable.
Suppose that is a filtration on the probability space , and that is a real-valued random variable with . Define for . Then is a martingale with respect to .
Details:
For , recall that . Taking expected values gives . Suppose that with . Using the tower property of conditional expected value,
The martingale in the last theorem is known as Doob's martingale and is named for Joseph Doob who did much of the pioneering work on martingales. It's also known as the Lévy martingale, named for Paul Lévy.
Doob's martingale arises naturally in the statistical context of Bayesian estimation. Suppose that is a sequence of independent random variables whose common distribution depends on an unknown real-valued parameter , with values in a parameter space . For each , let so that is the natural filtration associated with . In Bayesian estimation, we model the unknown parameter with a random variable taking values in and having a specified prior distribution. The Bayesian estimator of based on the sample is
So it follows that the sequence of Bayesian estimators is a Doob martingale. The estimation referred to in the discussion of the beta-Bernoulli process above is a special case.
Density Functions
For this example, you may need to review the sections on general measures and density functions. We start with our probability space and filtration in discrete time. Suppose now that is a finite measure on the sample space . For each , the restriction of to is a measure on the measurable space , and similarly the restriction of to is a probability measure on . To save notation and terminology, we will refer to these as and on , respectively. Suppose now that is absolutely continuous with respect to on for each . Recall that this means that if and then for every with . By the Radon-Nikodym theorem, has a density function with respect to on for each . The density function of a measure with respect to a positive measure is known as a Radon-Nikodym derivative. The theorem and the derivative are named for Johann Radon and Otto Nikodym. Here is our main result.
is a martingale with respect to .
Details:
Let . By definition, is measurable with respect to . Also, (the total variation of ) for each . Since is a finite measure, . By definition,
On the other hand, if then and so . So to summarize, is -measurable and for all . By definition, this means that , and so is a martingale with respect to .
Note that may not be absolutely continuous with respect to on or even on . On the other hand, if is absolutely continuous with respect to on then has a density function with respect to on . So a natural question in this case is the relationship between the martingale and the random variable . You may have already guessed the answer, but at any rate it will be given in the section on convergence.