1. Random
  2. 16. Martingales
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6

1. Introduction

Basic Theory

Basic Assumptions

For our basic ingredients, we start with a stochastic process X={Xt:tT} on an underlying probability space (Ω,F,P), having state space R, and where the index set T (representing time) is either N (discrete time) or [0,) (continuous time). So to review what all this means, Ω is the sample space, F the σ-algebra of events, P the probability measure on (Ω,F), and Xt is a random variable with values in R for each tT. Next, we have a filtration F={Ft:tT}, and we assume that X is adapted to F. To review again, F is an increasing family of sub σ-algebras of F, so that FsFtF for s,tT with st, and Xt is measurable with respect to Ft for tT. We think of Ft as the collection of events up to time tT, thus encoding the information available at time t. Finally, we assume that E(|Xt|)<, so that the mean of Xt exists as a real number, for each tT.

There are two important special cases of the basic setup. The simplest case, of course, is when Ft=σ{Xs:sT,st} for tT, so that F is the natural filtration associated with X. Another case that arises frequently is when we have a second stochastic process Y={Yt:tT} on (Ω,F,P) with values in a general measure space (S,S), and F is the natural filtration associated with Y. So in this case, our main assumption is that Xt is measurable with respect to σ{Ys:sT,st} for tT.

The theory of martingales is beautiful, elegant, and mostly accessible in discrete time, when T=N. But as with the theory of Markov processes, martingale theory is technically much more complicated in continuous time, when T=[0,). In this case, additional assumptions about the continuity of the sample paths tXt and the filtration tFt are often necessary in order to have a nice theory. Specifically, we will assume that the processX is right continuous and has left limits, and that the filtration F is right continuous and complete. These are the standard assumptions in continuous time.

Definitions

For the basic definitions that follow, you may need to review conditional expected value with respect to a σ-algebra.

The process X is a martingale with respect to F if E(XtFs)=Xs for all s,tT with st.

In the special case that F is the natural filtration associated with X, we simply say that X is a martingale, without reference to the filtration. In the special case that we have a second stochastic process Y={Yt:tT} and F is the natural filtration associated with Y, we say that X is a martingale with respect to Y.

The term martingale originally referred to a portion of the harness of a horse, and was later used to describe gambling strategies, such as the one used in the Petersburg paradox, in which bets are doubled when a game is lost. To interpret the definitions above in terms of gambling, suppose that a gambler is at a casino, and that Xt represents her fortune at time tT and Ft the information available to her at time t. Suppose now that s,tT with s<t and that we think of s as the current time, so that t is a future time. If X is a martingale with respect to F then the games are fair in the sense that the gambler's expected fortune at the future time t is the same as her current fortune at time s. To venture a bit from the casino, suppose that Xt is the price of a stock, or the value of a stock index, at time tT. If X is a martingale, then the expected value at a future time, given all of our information, is the present value.

An English-style breastplate with a running martingale attachement. By Danielle M., CC BY 3.0, from Wikipedia
Martingale harness

But as we will see, martingales are useful in probability far beyond the application to gambling and even far beyond financial applications generally. Indeed, martingales are of fundamental importance in modern probability theory. Here are two related definitions, with equality in the martingale condition replaced by inequalities.

Suppose again that the process X and the filtration F satisfy the basic assumptions in above.

  1. X is a sub-martingale with respect to F if E(XtFs)Xs for all s,tT with st.
  2. X is a super-martingale with respect to F if E(XtFs)Xs for all s,tT with st.

In the gambling setting, a sub-martingale models games that are favorable to the gambler on average, while a super-martingale models games that are unfavorable to the gambler on average. To venture again from the casino, suppose that Xt is the price of a stock, or the value of a stock index, at time tT. If X is a sub-martingale, the expected value at a future time, given all of our information, is greater than the present value, and if X is a super-martingale then the expected value at the future time is less than the present value. One hopes that a stock index is a sub-martingale.

Clearly X is a martingale with respect to F if and only if it is both a sub-martingale and a super-martingale. Finally, recall that the conditional expected value of a random variable with respect to a σ-algebra is itself a random variable, and so the equations and inequalities in the definitions should be interpreted as holding with probability 1. In this section generally, statements involving random variables are assumed to hold with probability 1.

The conditions that define martingale, sub-martingale, and super-martingale make sense if the index set T is any totally ordered set. In some applications that we will consider later, T={0,1,,n} for fixed nN+. In the section on backwards martingales, T={n:nN} or T=(,0]. In the case of discrete time when T=N, we can simplify the definitions slightly.

Suppose that X={Xn:nN} satisfies the basic assumptions above.

  1. X is a martingale with respect to F if and only if E(Xn+1Fn)=Xn for all nN.
  2. X is a sub-martingale with respect to F if and only if E(Xn+1Fn)Xn for all nN.
  3. X is a super-martingale with respect to F if and only if E(Xn+1Fn)Xn for all nN.
Details:

The conditions in the definitions clearly imply the conditions here, so we just need to show the opposite implications. Thus, assume that the condition in (a) holds and suppose that k,nN with k<n. Then kn1 so FkFn1 and hence E(XnFk)=E[E(XnFn1)Fk]=E(Xn1Fk) Repeating the argument, we get to E(XnFk)=E(Xk+1Fk)=Xk The proof for sub and super-martingales is analogous, with inequalities replacing the last equality.

The relations that define martingales, sub-martingales, and super-martingales hold for the ordinary (unconditional) expected values.

Suppose that s,tT with st.

  1. If X is a martingale with respect to F then E(Xs)=E(Xt).
  2. If X is a sub-martingale with respect to F then E(Xs)E(Xt).
  3. If X is a super-martingale with respect to F then E(Xs)E(Xt).
Details:

The results follow directly from the definitions, and the critical fact that E[E(XtFs)]=E(Xt) for s,tT.

So if X is a martingale then X has constant expected value, and this value is referred to as the mean of X.

Examples

The goal for the remainder of this section is to give some classical examples of martingales, and by doing so, to show the wide variety of applications in which martingales occur. We will return to many of these examples in subsequent sections. Without further ado, we assume that all random variables are real-valued, unless otherwise specified, and that all expected values mentioned below exist in R. Be sure to try the proofs yourself before expanding the details.

Constant Sequence

Our first example is rather trivial, but still worth noting.

Suppose that F={Ft:tT} is a filtration on the probability space (Ω,F,P) and that X is a random variable that is measurable with respect to F0 and satisfies E(|X|)<. Let Xt=X for tT. Them X={Xt:tT} is a martingale with respect to F.

Details:

Since X is measurable with respect to F0, it is measurable with respect to Ft for all tT. Hence X is adapted to F. If s,tT with st, then E(XtFs)=E(XFs)=X=Xs.

Partial Sums

For our next discussion, we start with one of the most basic martingales in discrete time, and the one with the simplest interpretation in terms of gambling. Suppose that V={Vn:nN} is a sequence of independent random variables with E(|Vk|)< for kN. Let Xn=k=0nVk,nN

so that X={Xn:nN} is simply the partial sum process associated with V.

For the partial sum process X,

  1. If E(Vn)0 for nN+ then X is a sub-martingale.
  2. If E(Vn)0 for nN+ then X is a super-martingale.
  3. If E(Vn)=0 for nN+ then X is a martingale.
Details:

Let Fn=σ{X0,X1,,Xn}=σ{V0,V1,,Vn} for nN . Note first that E(|Xn|)k=0nE(|Vk|)<,nN Next, E(Xn+1Fn)=E(Xn+Vn+1Fn)=E(XnFn)+E(Vn+1Fn)=Xn+E(Vn+1),nN The last equality holds since Xn is measurable with respect to Fn and Vn+1 is independent of Fn. The results now follow from the definitions.

In terms of gambling, if X0=V0 is the gambler's initial fortune and Vi is the gambler's net winnings on the ith game, then Xn is the gamblers net fortune after n games for nN+. But partial sum processes associated with independent sequences are important far beyond gambling. In fact, much of classical probability deals with partial sums of independent and identically distributed variables. The entire chapter on Random Samples explores this setting.

Note that E(Xn)=k=0nE(Vk). Hence condition (a) is equivalent to nE(Xn) increasing, condition (b) is equivalent to nE(Xn) decreasing, and condition (c) is equivalent to nE(Xn) constant. Here is another martingale associated with the partial sum process, known as the second moment martingale.

Suppose that E(Vk)=0 for kN+ and var(Vk)< for kN. Let Yn=Xn2var(Xn),nN Then Y={Yn:nN} is a martingale with respect to X.

Details:

Again, let Fn=σ{X0,X1,,Xn} for nN. Since the sequence V is independent, note that var(Xn)=var(k=0nVk)=k=0nvar(Vk) Also, var(Vk)=E(Vk2) since E(Vk)=0 for kN+. In particular, E(|Yn|)< for nN. Next for nN, E(Yn+1Fn)=E[Xn+12var(Xn+1)Fn]=E[(Xn+Vn+1)2var(Xn+1)Fn]=E[Xn2+2XnVn+1+Vn+12var(Xn+1)Fn]=Xn2+2XnE(Vn+1)+E(Vn+12)var(Xn+1) since Xn is measurable with respect to Fn and Vn+1 is independent of Fn. But E(Vn+1)=0 and E(Vn+12)var(Xn+1)=var(Xn). Hence we have E(Yn+1Fn)=Xn2var(Xn)=Yn for nN.

So under the assumptions in this theorem, both X and Y are martingales. We will generalize the results for partial sum processes below in the discussion on processes with independent increments in .

Martingale Difference Sequences

In the last discussion, we saw that the partial sum process associated with a sequence of independent, mean 0 variables is a martingale. Conversely, every martingale in discrete time can be written as a partial sum process of uncorrelated mean 0 variables. This representation gives some significant insight into the theory of martingales generally. Suppose that X={Xn:nN} is a martingale with respect to the filtration F={Fn:nN}.

Let V0=X0 and Vn=XnXn1 for nN+. The process V={Vn:nN} is the martingale difference sequence associated with X and Xn=k=0nVk,nN

As promised, the martingale difference variables have mean 0, and in fact satisfy a stronger property.

Suppose that V={Vn:nN} is the martingale difference sequence associated with X. Then

  1. V is adapted to F.
  2. E(VnFk)=0 for k,nN with k<n.
  3. E(Vn)=0 for nN+
Details:
  1. Of course V0=X0 is measurable with respect to F0. For nN+, Xn and Xn1, and hence Vn are measurable with respect to Fn
  2. Let kN. By the martingale and adapted properties, E(Vk+1Fk)=E(Xk+1Fk)E(XkFk)=XkXk=0 Next by the tower property, E(Vk+2Fk)=E[E(Vk+2Fk+1)Fk]=0 Continuing (or using induction) gives the general result.
  3. Since X is a martingale, it has constant mean, as noted above. Hence E(Vn)=E(Xn)E(Xn1)=0 for nN+. We could also use part (b).

Also as promised, if the martingale variables have finite variance, then the martingale difference variables are uncorrelated.

Suppose again that V={Vn:nN} is the martingale difference sequence associated with the martingale X. Assume that var(Xn)< for nN. Then V is an uncorrelated sequence. Moreover, var(Xn)=k=0nvar(Vk)=var(X0)+k=1nE(Vk2),nN

Details:

Let k,nN with k<n. To show that Vk and Vn are uncorrelated, we just need to show that E(VkVn)=0 (since E(Vn)=0). But by the previous result, E(VkVn)=E[E(VkVnFk)]=E[VkE(VnFk)]=0 Finally, the variance of a sum of uncorrelated variables is the sum of the variances. Since Vk has mean 0, var(Vk)=E(Vk2) for kN+. Hence the formula for var(Xn) holds.

We now know that a discrete-time martingale is the partial sum process associated with a sequence of uncorrelated variables. Hence we might hope that there are martingale versions of the fundamental theorems that hold for a partial sum process associated with an independent sequence. This turns out to be true, and is a basic reason for the importance of martingales.

Discrete-Time Random Walks

Suppose that V={Vn:nN} is a sequence of independent random variables with {Vn:nN+} identically distributed. We assume that E(|Vn|)< for nN and we let a denote the common mean of {Vn:nN+}. Let X={Xn:nN} be the partial sum process associated with V so that Xn=i=0nVi,nN This setting is a special case of the more general partial sum process considered in . The process X is sometimes called a (discrete-time) random walk. The initial position X0=V0 of the walker can have an arbitrary distribution, but then the steps that the walker takes are independent and identically distributed. In terms of gambling, X0=V0 is the initial fortune of the gambler playing a sequence of independent and identical games. If Vi is the amount won (or lost) on game iN+, then Xn is the gambler's net fortune after n games.

For the random walk X,

  1. X is a martingale if a=0.
  2. X is a sub-martingale if a0.
  3. X is a super-martingale if a0

For the second moment martingale, suppose that Vn has common mean a=0 and common variance b2< for nN+, and that var(V0)<.

Let Yn=Xn2var(V0)b2n for nN. Then Y={Yn:nN} is a martingale with respect to X.

Details:

This follows from the corresponding result for a general partial sum process, above, since var(Xn)=k=0nvar(Vk)=var(V0)+b2n,nN

We will generalize the results for discrete-time random walks below, in the discussion on processes with stationary, independent increments.

Partial Products

Our next discussion is similar to the one on partial sum processes in , but with products instead of sums. So suppose that V={Vn:nN} is an independent sequence of nonnegative random variables with E(Vn)< for nN. Let Xn=i=0nVi,nN so that X={Xn:nN} is the partial product process associated with X.

For the partial product process X,

  1. If E(Vn)=1 for nN+ then X is a martingale with respect to V
  2. If E(Vn)1 for nN+ then X is a sub-martingale with respect to V
  3. If E(Vn)1 for nN+ then X is a super-martingale with respect to V
Details:

Let Fn=σ{V0,V1,,Vn} for nN. Since the variables are independent, E(Xn)=i=0nE(Vi)< Next, E(Xn+1Fn)=E(XnVn+1Fn)=XnE(Vn+1Fn)=XnE(Vn+1)nN since Xn is measurable with respect to Fn and Vn+1 is independent of Fn. The results now follow from the definitions.

As with random walks, a special case of interest is when {Vn:nN+} is an identically distributed sequence.

The Simple Random Walk

Suppose now that that V={Vn:nN} is a sequence of independent random variables with P(Vi=1)=p and P(Vi=1)=1p for iN+, where p(0,1). Let X={Xn:nN} be the partial sum process associated with V so that Xn=i=0nVi,nN Then X is the simple random walk with parameter p, and of course, is a special case of the more general random walk studied in . In terms of gambling, our gambler plays a sequence of independent and identical games, and on each game, wins €1 with probability p and loses €1 with probability 1p. So if V0 is the gambler's initial fortune, then Xn is her net fortune after n games.

For the simple random walk,

  1. If p>12 then X is a sub-martingale.
  2. If p<12 then X is a super-martingale.
  3. If p=12 then X is a martingale.
Details:

Note that E(Vn)=p(1p)=2p1 for nN+, so the results follow from [11].

So case (a) corresponds to favorable games, case (b) to unfavorable games, and case (c) to fair games.

Open the simulation of the simple symmetric random. For various values of the number of trials n, run the simulation 1000 times and note the general behavior of the sample paths.

Here is the second moment martingale for the simple, symmetric random walk.

Consider the simple random walk with parameter p=12, and let Yn=Xn2var(V0)n for nN. Then Y={Yn:nN} is a martingale with respect to X

Details:

Note that E(Vi)=0 and var(Vi)=1 for each iN+, so the result follows from the general result above.

But there is another martingale that can be associated with the simple random walk, known as De Moivre's martingale and named for one of the early pioneers of probability theory, Abraham De Moivre.

For nN define Zn=(1pp)Xn Then Z={Zn:nN} is a martingale with respect to X.

Details:

Note that Zn=k=0n(1pp)Vk,nN and E[(1pp)Vk]=(1pp)1p+(1pp)1(1p)=1,kN+ So the result follows from [13].

The Beta-Bernoulli Process

Recall that the beta-Bernoulli process is constructed by randomizing the success parameter in a Bernoulli trials process with a beta distribution. Specifically we have a random variable P that has the beta distribution with parameters a,b(0,), and a sequence of indicator variables X=(X1,X2,) such that given P=p(0,1), X is a sequence of independent variables with P(Xi=1)=p for iN+. As usual, we couch this in reliability terms, so that Xi=1 means success on trial i and Xi=0 means failure. In our study of this process, we showed that the finite-dimensional distributions are given by P(X1=x1,X2=x2,,Xn=xn)=a[k]b[nk](a+b)[n],nN+,(x1,x2,,xn){0,1}n where we use the ascending power notation r[j]=r(r+1)(r+j1) for rR and jN. Next, let Y={Yn:nN} denote the partial sum process associated with X, so that once again, Yn=i=1nXi,nN Of course Yn is the number of success in the first n trials and has the beta-binomial distribution defined by P(Yn=k)=(nk)a[k]b[nk](a+b)[n],k{0,1,,n} Now let Zn=a+Yna+b+n,nN This variable also arises naturally. Let Fn=σ{X1,X2,,Xn} for nN. Then as shown in the section on the beta-Bernoulli process, Zn=E(Xn+1Fn)=E(PFn). In statistical terms, the second equation means that Zn is the Bayesian estimator of the unknown success probability p in a sequence of Bernoulli trials, when p is modeled by the random variable P.

Z={Zn:nN} is a martingale with respect to X.

Details:

Note that 0Zn1 so E(Zn)< for nN. Next, E(Zn+1Fn)=E[a+Yn+1a+b+n+1|Fn]=E[a+(Yn+Xn+1)Fn]a+b+n+1=a+Yn+E(Xn+1Fn)a+b+n+1 As noted above, E(Xn+1Fn)=(a+Yn)/(a+b+n). Substituting into the displayed equation above and doing a bit of algebra we have E(Zn+1Fn)=(a+Yn)+(a+Yn)/(a+b+n)a+b+n+1=a+Yna+b+n=Zn

Open the beta-Binomial experiment. Run the simulation 1000 times for various values of the parameters, and compare the empirical probability density function with the true probability density function.

Pólya's Urn Process

Recall that in the simplest version of Pólya's urn process, we start with an urn containing a red and b green balls. At each discrete time step, we select a ball at random from the urn and then replace the ball and add c new balls of the same color to the urn. For the parameters, we need a,bN+ and cN. For iN+, let Xi denote the color of the ball selected on the ith draw, where 1 means red and 0 means green. The process X={Xn:nN+} is a classical example of a sequence of exchangeable yet dependent variables. Let Y={Yn:nN} denote the partial sum process associated with X, so that once again, Yn=i=1nXi,nN Of course Yn is the total number of red balls selected in the first n draws. Hence at time nN, the total number of red balls in the urn is a+cYn, while the total number of balls in the urn is a+b+cn and so the proportion of red balls in the urn is Zn=a+cYna+b+cn

Z={Zn:nN} is a martingale with respect to X.

Details:

Indirect proof: If c=0 then Zn=a/(a+b) for nN so Z is a constant martingale. If cN+ then Z is equivalent to the beta-Bernoulli process with parameters a/c and b/c. Moreover, Zn=a+cYna+b+cn=a/c+Yna/c+b/c+n,nN So Z is a martingale by [18].

Direct Proof: Trivially, 0Zn1 so E(Zn)< for nN. Let Fn=σ{X1,X2,,Xn}. For nN, E(Zn+1Fn)=E[a+cYn+1a+b+c(n+1)|Fn]=E[a+c(Yn+Xn+1)Fn]a+b+c(n+1)=a+cYn+cE(Xn+1Fn)a+b+cn+c since Yn is measurable with respect to Fn. But the probability of selecting a red ball on draw n+1, given the history of the process up to time n, is simply the proportion of red balls in the urn at time n. That is, E(Xn+1Fn)=P(Xn+1=1Fn)=Zn=a+cYna+b+cn Substituting and simplifying gives E(Zn+1Fn)=Zn.

Open the simulation of Pólya's Urn Experiment. Run the simulation 1000 times for various values of the parameters, and compare the empirical probability density function of the number of red ball selected to the true probability density function.

Processes with Independent Increments.

Our first example above concerned the partial sum process in X associated with a sequence of independent random variables V. Such processes are the only ones in discrete time that have independent increments. That is, for m,nN with mn, XnXm is independent of (X0,X1,,Xm). The random walk process in has the additional property of stationary increments. That is, the distribution of XnXm is the same as the distribution of XnmX0 for m,nN with mn. Let's consider processes in discrete or continuous time with these properties. Thus, suppose that X={Xt:tT} satisfying the basic assumptions in above relative to the filtration F={Fs:sT}. Here are the two definitions.

The process X has

  1. Independent increments if XtXs is independent of Fs for all s,tT with st.
  2. Stationary increments if XtXs has the same distribution as XtsX0 for all s,tT.

Processes with stationary and independent increments were studied in the chapter on Markov processes. In continuous time (with the continuity assumptions we have imposed), such a process is known as a Lévy process, named for Paul Lévy, and also as a continuous-time random walk. For a process with independent increments (not necessarily stationary), the connection with martingales depends on the mean function m given by m(t)=E(Xt) for tT.

Suppose that X={Xt:t[0,)} has independent increments.

  1. If m is increasing then X is a sub-martingale.
  2. If m is decreasing then X is a super-martingale.
  3. If m is constant then X is a martingale
Details:

The proof is just like the one above for partial sum processes. Suppose that s,t[0,) with s<t. Then E(XtFs)=E[Xs+(XtXs)Fs]=E(XsFs)+E(XtXsFs) But Xs is measurable with respect to Fs and XtXs is independent of Fs So E(XtFs)=Xs+E(XtXs)=Xs+m(t)m(s)

Compare this result with the corresponding result for the partial sum process in [6]. Suppose now that X={Xt:t[0,)} is a stochastic process as above, with mean function m, and let Yt=Xtm(t) for t[0,). The process Y={Yt:t[0,)} is sometimes called the compensated process associated with X and has mean function 0. If X has independent increments, then clearly so does Y. Hence the following result is a trivial corollary to our previous theorem.

Suppose that X has independent increments. The compensated process Y is a martingale.

Next we give the second moment martingale for a process with independent increments, generalizing the second moment martingale for a partial sum process in [7].

Suppose that X={Xt:tT} has independent increments with constant mean function and and with var(Xt)< for tT. Then Y={Yt:tT} is a martingale where Yt=Xt2var(Xt),tT

Details:

The proof is essentially the same as for the partial sum process in discrete time. Suppose that s,tT with s<t. Note that E(YtFs)=E(Xt2Fs)var(Xt). Next, Xt2=[(XtXs)+Xs]2=(XtXs)2+2(XtXs)Xs+Xs2 But XtXs is independent of Fs, Xs is measurable with respect to Fs, and E(XtXs)=0 so E(Xt2Fs)=E[(XtXs)2]+2XsE(XtXs)+Xs2=E[(XtXs)2]+Xs2 But also by independence and since XtXs has mean 0, var(Xt)=var[(XtXs)+Xs]=var(Xs)+var(XtXs)2=var(Xs)+E[(XtXs)2 Putting the pieces together gives E(YtFs)=Xs2var(Xs)=Ys

Of course, since the mean function is constant, X is also a martingale. For processes with independent and stationary increments (that is, random walks), the last two theorems simplify, because the mean and variance functions simplify.

Suppose that X={Xt:tT} has stationary, independent increments, and let a=E(X1X0). Then

  1. X is a martingale if a=0
  2. X is a sub-martingale if a0
  3. X is a super-martingale if a0
Details:

Recall that the mean function m is given by m(t)=E(X0)+at for tT, so the result follows from the corresponding result in [23] for a process with independent increments.

Compare this result with the corresponding result in [11] for discrete-time random walks. Our next result is the second moment martingale. Compare this with the second moment martingale in [12] for discrete-time random walks.

Suppose that X={Xt:tT} has stationary, independent increments with E(X0)=E(X1) and b2=E(X12)<. Then Y={Yt:tT} is a martingale where Yt=Xt2var(X0)b2t,tT

Details:

Recall that if E(X0)=E(X1) then X has constant mean function. Also, var(Xt)=var(X0)+b2t, so the result follows from the corresponding result for a process with independent increments.

In discrete time, as we have mentioned several times, all of these results reduce to the earlier results for partial sum processes and random walks. In continuous time, the Poisson processes, named of course for Simeon Poisson, provides examples. The standard (homogeneous) Poisson counting process N={Nt:t[0,)} with constant rate r(0,) has stationary, independent increments and mean function given by m(t)=rt for t[0,). More generally, suppose that r:[0,)(0,) is piecewise continuous (and non-constant). The non-homogeneous Poisson counting process N={Nt:t[0,)} with rate function r has independent increments and mean function given by m(t)=0tr(s)ds,t[0,) The increment NtNs has the Poisson distribution with parameter m(t)m(s) for s,t[0,) with s<t, so the process does not have stationary increments. In all cases, m is increasing, so the following results are corollaries of our general results:

Let N={Nt:t[0,)} be the Poisson counting process with rate function r:[0,)(0,). Then

  1. N is a sub-martingale
  2. The compensated process X={Ntm(t):t[0,)} is a martinagle.

Open the simulation of the Poisson counting experiment. For various values of r and t, run the experiment 1000 times and compare the empirical probability density function of the number of arrivals with the true probability density function.

We will see further examples of processes with stationary, independent increments in continuous time (and so also examples of continuous-time martingales) in our study of Brownian motion.

Likelihood Ratio Tests

Suppose that (S,S,μ) is a general measure space, and that X={Xn:nN} is a sequence of independent, identically distributed random variables, taking values in S. In statistical terms, X corresponds to sampling from the common distribution, which is usually not completely known. Indeed, the central problem in statistics is to draw inferences about the distribution from observations of X. Suppose now that the underlying distribution either has probability density function g0 or probability density function g1, with respect to μ. We assume that g0 and g1 are positive on S. Of course the common special cases of this setup are

The likelihood ratio test is a hypothesis test, where the null and alternative hypotheses are

The test is based on the test statistic Ln=i=1ng0(Xi)g1(Xi),nN known as the likelihood ratio test statistic. Small values of the test statistic are evidence in favor of the alternative hypothesis H1. Here is our result.

Under the alternative hypothesis H1, the process L={Ln:nN} is a martingale with respect to X, known as the likelihood ratio martingale.

Details:

Let Fn=σ{X1,X2,,Xn}. For nN, E(Ln+1Fn)=E[Lng0(Xn+1)g1(Xn+1)|Fn]=LnE[g0(Xn+1)g1(Xn+1)] Since Ln is measurable with respect to Fn and g0(Xn+1)/g1(Xn+1) is independent of Fn. But under H1, and using the change of variables formula for expected value, we have E[g0(Xn+1)g1(Xn+1)]=Sg0(x)g1(x)g1(x)dμ(x)=Sg0(x)dμ(x)=1 This result also follows essentially from on partial products. The sequence Z=(Z1,Z2,) given by Zi=g0(Xi)/g1(Xi) for iN+ is independent and identically distributed, and as just shown, has mean 1 under H1.

Branching Processes

In the simplest model of a branching process, we have a system of particles each of which can die out or split into new particles of the same type. The fundamental assumption is that the particles act independently, each with the same offspring distribution on N. We will let f denote the (discrete) probability density function of the number of offspring of a particle, m the mean of the distribution, and ϕ the probability generating function of the distribution. Thus, if U is the number of children of a particle, then f(n)=P(U=n) for nN, m=E(U), and ϕ(t)=E(tU) defined at least for t(1,1].

Our interest is in generational time rather than absolute time: the original particles are in generation 0, and recursively, the children a particle in generation n belong to generation n+1. Thus, the stochastic process of interest is X={Xn:nN} where Xn is the number of particles in the nth generation for nN. The process X is a Markov chain and was studied in the section on discrete-time branching chains. In particular, one of the fundamental problems is to compute the probability q of extinction starting with a single particle: q=P(Xn=0 for some nNX0=1) Then, since the particles act independently, the probability of extinction starting with xN particles is simply qx. We will assume that f(0)>0 and f(0)+f(1)<1. This is the interesting case, since it means that a particle has a positive probability of dying without children and a positive probability of producing more than 1 child. The fundamental result, you may recall, is that q is the smallest fixed point of ϕ (so that ϕ(q)=q) in the interval [0,1]. Here are two martingales associated with the branching process:

Each of the following is a martingale with respect to X.

  1. Y={Yn:nN} where Yn=Xn/mn for nN.
  2. Z={Zn:nN} where Zn=qXn for nN.
Details:

Let Fn=σ{X0,X1,,Xn}. For nN, note that Xn+1 can be written in the form Xn+1=i=1XnUi where U=(U1,U2,) is a sequence of independent variables, each with PDF f (and hence mean μ and PGF ϕ), and with U independent of Fn. Think of Ui as the number of children of the ith particle in generation n.

  1. For nN, E(Yn+1Fn)=E(Xn+1mn+1|Fn)=1mn+1E(i=0XnUi|Fn)=1mn+1mXn=Xnmn=Yn
  2. For nN E(Zn+1Fn)=E(qXn+1Fn)=E(qi=1XnUi|Fn)=[ϕ(q)]Xn=qXn=Zn

Doob's Martingale

Our next example is one of the simplest, but most important. Indeed, as we will see later in the section on convergence, this type of martingale is almost universal in the sense that every uniformly integrable martingale is of this type. The process is constructed by conditioning a fixed random variable on the σ-algebras in a given filtration, and thus accumulating information about the random variable.

Suppose that F={Ft:tT} is a filtration on the probability space (Ω,F,P), and that X is a real-valued random variable with E(|X|)<. Define Xt=E(XFt) for tT. Then X={Xt:tT} is a martingale with respect to F.

Details:

For tT, recall that |Xt|=|E(XFt)|E(|X|Ft). Taking expected values gives E(|Xt|)E(|X|)<. Suppose that s,tT with s<t. Using the tower property of conditional expected value, E(XtFs)=E[E(XFt)Fs]=E(XFs)=Xs

The martingale in the last theorem is known as Doob's martingale and is named for Joseph Doob who did much of the pioneering work on martingales. It's also known as the Lévy martingale, named for Paul Lévy.

Doob's martingale arises naturally in the statistical context of Bayesian estimation. Suppose that X=(X1,X2,) is a sequence of independent random variables whose common distribution depends on an unknown real-valued parameter θ, with values in a parameter space AR. For each nN+, let Fn=σ{X1,X2,,Xn} so that F={Fn:nN+} is the natural filtration associated with X. In Bayesian estimation, we model the unknown parameter θ with a random variable Θ taking values in A and having a specified prior distribution. The Bayesian estimator of θ based on the sample Xn=(X1,X2,,Xn) is Un=E(ΘFn),nN+ So it follows that the sequence of Bayesian estimators U=(Un:nN+) is a Doob martingale. The estimation referred to in the discussion of the beta-Bernoulli process above is a special case.

Density Functions

For this example, you may need to review the sections on general measures and density functions. We start with our probability space (Ω,F,P) and filtration F={Fn:nN} in discrete time. Suppose now that μ is a finite measure on the sample space (Ω,F). For each nN, the restriction of μ to Fn is a measure on the measurable space (Ω,Fn), and similarly the restriction of P to Fn is a probability measure on (Ω,Fn). To save notation and terminology, we will refer to these as μ and P on Fn, respectively. Suppose now that μ is absolutely continuous with respect to P on Fn for each nN. Recall that this means that if AFn and P(A)=0 then μ(B)=0 for every BFn with BA. By the Radon-Nikodym theorem, μ has a density function Xn:ΩR with respect to P on Fn for each nN+. The density function of a measure with respect to a positive measure is known as a Radon-Nikodym derivative. The theorem and the derivative are named for Johann Radon and Otto Nikodym. Here is our main result.

X={Xn:nN} is a martingale with respect to F.

Details:

Let nN. By definition, Xn is measurable with respect to Fn. Also, E(|Xn|)=μ (the total variation of μ) for each nN. Since μ is a finite measure, μ<. By definition, μ(A)=AXndP=E(Xn;A),AFn On the other hand, if AFn then AFn+1 and so μ(A)=E(Xn+1;A). So to summarize, Xn is Fn-measurable and E(Xn+1;A)=E(Xn;A) for all AFn. By definition, this means that E(Xn+1Fn)=Xn, and so X is a martingale with respect to F.

Note that μ may not be absolutely continuous with respect to P on F or even on F=σ(n=0Fn). On the other hand, if μ is absolutely continuous with respect to P on F then μ has a density function X with respect to P on F. So a natural question in this case is the relationship between the martingale X and the random variable X. You may have already guessed the answer, but at any rate it will be given in the section on convergence.