1. Random
  2. 15. Markov Processes
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10
  13. 11
  14. 12
  15. 13
  16. 14
  17. 15
  18. 16
  19. 17
  20. 18
  21. 19
  22. 20
  23. 21
  24. 22
  25. 23

2. Potentials and Generators for General Markov Processes

Our goal in this section is to continue the broad sketch of the general theory of Markov processes. As with the last section, some of the statements are not completely precise and rigorous, because we want to focus on the main ideas without being overly burdened by technicalities. If you are a new student of probability, or are primarily interested in applications, you may want to skip ahead to the study of discrete-time Markov chains.

Preliminaries

Basic Definitions

As usual, our starting point is a probability space (Ω,F,P), so that Ω is the set of outcomes, F the σ-algebra of events, and P the probability measure on the sample space (Ω,F). The set of times T is either N, discrete time with the discrete topology, or [0,), continuous time with the usual Euclidean topology. The time set T is given the Borel σ-algebra T, which is just the power set if T=N, and then the time space (T,T) is given the usual measure, counting measure in the discrete case and Lebesgue measure in the continuous case. The set of states S has an LCCB topology (locally compact, Hausdorff, with a countable base), and is also given the Borel σ-algebra S. Recall that to say that the state space is discrete means that S is countable with the discrete topology, so that S is the power set of S. The topological assumptions mean that the state space (S,S) is nice enough for a rich mathematical theory and general enough to encompass the most important applications. There is often a natural Borel measure λ on (S,S), counting measure # if S is discrete, and for example, Lebesgue measure if S=Rk for some kN+.

Recall also that there are several spaces of functions on S that are important. Let B denote the set of bounded, measurable functions f:SR. Let C denote the set of bounded, continuous functions f:SR, and let C0 denote the set of continuous functions f:SR that vanish at in the sense that for every ϵ>0, there exists a compact set KS such |f(x)|<ϵ for xKc. These are all vector spaces under the usual (pointwise) addition and scalar multiplication, and C0CB. The supremum norm, defined by f=sup{|f(x)|:xS} for fB is the norm that is used on these spaces.

Suppose now that X={Xt:tT} is a time-homogeneous Markov process with state space (S,S) defined on the probability space (Ω,F,P). As before, we also assume that we have a filtration F={Ft:tT}, that is, an increasing family of sub σ-algebras of F, indexed by the time space, with the properties that Xt is measurable with repsect to Ft for tT. Intuitively, Ft is the collection of events up to time tT.

As usual, we let Pt denote the transition probability kernel for an increase in time of size tT. Thus Pt(x,A)=P(XtAX0=x),xS,AS Recall that for tT, the transition kernel Pt defines two operators, on the left with measures and on the right with functions. So, if μ is a measure on (S,S) then μPt is the measure on (S,S) given by μPt(A)=Sμ(dx)Pt(x,A),AS If μ is the distribution of X0 then μPt is the distribution of Xt for tT. If fB then PtfB is defined by Ptf(x)=SPt(x,dy)f(y)=E[f(Xt)X0=x] Recall that the collection of transition operators P={Pt:tT} is a semigroup because PsPt=Ps+t for s,tT. Just about everything in this section is defined in terms of the semigroup P, which is one of the main analytic tools in the study of Markov processes.

Feller Markov Processes

We make the same assumptions as in the Introduction. Here is a brief review:

We assume that the Markov process X={Xt:tT} satisfies the following properties (and hence is a Feller Markov process):

  1. For tT and yS, the distribution of Xt given X0=x converges to the distribution of Xt given X0=y as xy.
  2. Given X0=xS, Xt converges in probability to x as t0.

Part (a) is an assumption on continuity in space, while part (b) is an assumption on continuity in time. If S is discrete then (a) automatically holds, and if T is discrete then (b) automatically holds. As we will see, the Feller assumptions are sufficient for a very nice mathematical theory, and yet are general enough to encompass the most important continuous-time Markov processes.

The process X={Xt:tT} has the following properties:

  1. There is a version of X such that tXt is continuous from the right and has left limits.
  2. X is a strong Markov process relative to the F+0, the right-continuous refinement of the natural filtration.

The Feller assumptions on the Markov process have equivalent formulations in terms of the transition semigroup.

The transition semigroup P={Pt:tT} has the following properties:

  1. If fC0 and tT then PtfC0
  2. If fC0 and xS then Ptf(x)f(x) as t0.

As before, part (a) is a condition on continuity in space, while part (b) is a condition on continuity in time. Once again, (a) is trivial if S is discrete, and (b) trivial if T is discrete. The first condition means that Pt is a linear operator on C0 (as well as being a linear operator on B). The second condition leads to a stronger continuity result.

For fC0, the mapping tPtf is continuous on T. That is, for tT, PsfPtf=sup{|Psf(x)Ptf(x)|:xS}0 as st

Our interest in this section is primarily the continuous time case. However, we start with the discrete time case since the concepts are clearer and simpler, and we can avoid some of the technicalities that inevitably occur in continuous time.

Discrete Time

Suppose that T=N, so that time is discrete. Recall that the transition kernels are just powers of the one-step kernel. That is, we let P=P1 and then Pn=Pn for nN.

Potential Operators

For α(0,1], the α-potential kernel Rα of X is defined as follows: Rα(x,A)=n=0αnPn(x,A),xS,AS

  1. The special case R=R1 is simply the potential kernel of X.
  2. For xS and AS, R(x,A) is the expected number of visits of X to A, starting at x.
Details:

The function xRα(x,A) from S to [0,) is measurable for AS since xPn(x,A) is measurable for each nN. The mapping ARα(x,A) is a positive measure on S for xS since APn(x,A) is a probability measure for each nN. Finally, the interpretation of R(x,A) for xS and AS comes from interchanging sum and expected value, which is allowed since the terms are nonnegative: R(x,A)=n=0Pn(x,A)=n=0E[1(XnA)X0=x]=E(n=01(XnA)|X0=x)=E[#{nN:XnA}X0=x]

Note that it's quite possible that R(x,A)= for some xS and AS. In fact, knowing when this is the case is of considerable importance in the study of Markov processes. As with all kernels, the potential kernel Rα defines two operators, operating on the right on functions, and operating on the left on positive measures. For the right potential operator, if f:SR is measurable then Rαf(x)=n=0αnPnf(x)=n=0αnSPn(x,dy)f(y)=n=0αnE[f(Xn)X0=x],xS assuming as usual that the expected values and the infinite series make sense. This will be the case, in particular, if f is nonnegative or if p(0,1) and fB.

If α(0,1), then Rα(x,S)=11α for all xS.

Details:

Using geometric series, Rα(x,S)=n=0αnPn(x,S)=n=0αn=11α

It follows that for α(0,1), the right operator Rα is a bounded, linear operator on B with Rα=11α. It also follows that (1α)Rα is a probability kernel. There is a nice interpretation of this kernel.

If α(0,1) then (1α)Rα(x,) is the conditional distribution of XN given X0=xS, where N is independent of X and has the geometric distribution on N with parameter 1α.

Details:

Suppose that xS and AS. Conditioning on N gives P(XNAX0=x)=n=0P(N=n)P(XNAN=n,X0=x) But by the substitution rule and the assumption of independence, P(XNAN=n,X0=x)=P(XnAN=n,X0=x)=P(XnAX0=x)=Pn(x,A) Since N has the geometric distribution on N with parameter 1α we have P(N=n)=(1α)αn for nN. Substituting gives P(XNAX0=x)=n=0(1α)αnPn(x,A)=(1α)Rα(x,A)

So (1α)Rα is a transition probability kernel, just as Pn is a transition probability kernel, but corresponding to the random time N (with α(0,1) as a parameter), rather than the deterministic time nN. An interpretation of the potential kernel Rα for α(0,1) can be also given in economic terms. Suppose that AS and that we receive one monetary unit each time the process X visits A. Then as above, R(x,A) is the expected total amount of money we receive, starting at xS. However, typically money that we will receive at times distant in the future has less value to us now than money that we will receive soon. Specifically suppose that a monetary unit received at time nN has a present value of αn, where α(0,1) is an inflation factor (sometimes also called a discount factor). Then Rα(x,A) gives the expected, total, discounted amount we will receive, starting at xS. A bit more generally, if fB is a reward function, so that f(x) is the reward (or cost, depending on the sign) that we receive when we visit state xS, then for α(0,1), Rαf(x) is the expected, total, discounted reward, starting at xS.

For the left potential operator, if μ is a positive measure on S then μRα(A)=n=0αnμPn(A)=n=0αnSμ(dx)Pn(x,A),AS In particular, if μ is a probability measure and X0 has distribution μ then μPn is the distribution of Xn for nN, so from the last result, (1α)μRα is the distribution of XN where again, N is independent of X and has the geometric distribution on N with parameter 1α. The family of potential kernels gives the same information as the family of transition kernels.

The potential kernels R={Rα:α(0,1)} completely determine the transition kernels P={Pn:nN}.

Details:

Note that for xS and AS, the function αRα(x,A) is a power series in α with coefficients nPn(x,A). In the language of combinatorics, αRα(x,A) is the ordinary generating function of the sequence nPn(x,A). As noted above, this power series has radius of convergence at least 1, so we can extend the domain to α(1,1). Thus, given the potential kernels, we can recover the transition kernels by taking derivatives and evaluating at 0: Pn(x,A)=1n![dndαnRα(x,A)]α=0

Of course, it's really only necessary to determine P, the one step transition kernel, since the other transition kernels are powers of P. In any event, it follows that the kernels R={Rα:α(0,1)}, along with the initial distribution, completely determine the finite dimensional distributions of the Markov process X. The potential kernels commute with each other and with the transition kernels.

Suppose that α,β(0,1] and kN. Then (as kernels)

  1. PkRα=RαPk=n=0αnPn+k
  2. RαRβ=RβRα=m=0n=0αmβnPm+n
Details:

Suppose that fB is nonnegative. The interchange of the sums with the kernel operation is allowed since the kernels are nonnegative. The other tool used is the semigroup property.

  1. Directly RαPkf=n=0αnPnPkf=n=0αnPn+kf The other direction requires an interchange. PkRαf=Pkn=0αnPnf=n=0αnPkPnf=n=0αnPn+kf
  2. First, RαRβf=m=0αmPmRβf=m=0αmPm(n=0βnPnf)=m=0n=0αmβnPmPnf=m=0n=0αmβnPm+nf The other direction is similar.

The same identities hold for the right operators on the entire space B, with the additional restrictions that α<1 and β<1. The fundamental equation that relates the potential kernels is given next.

If α,β(0,1] with αβ then (as kernels), βRβ=αRα+(βα)RαRβ

Details:

If α=β the equation is trivial, so assume α<β. Suppose that fB is nonnegative. From the previous result, RαRβf=j=0k=0αjβkPj+kf Changing variables to sum over n=j+k and j gives RαRβf=n=0j=0nαjβnjPnf=n=0j=0n(αβ)jβnPnf=n=01(αβ)n+11αββnPnf Simplifying gives RαRβf=1βα(βRβfαRαf) Note that since α<1, Rαf is a finite, so we don't have to worry about the dreaded indeterminate form .

The same identity holds holds for the right operators on the entire space B, with the additional restriction that β<1.

If α(0,1], then (as kernels), I+αRαP=I+αPRα=Rα.

Details:

Suppose that fB is nonnegative. From the result above, (I+αRαP)f=(I+αPRα)f=f+n=0αn+1Pn+1f=n=0αnPnf=Rαf

The same identity holds for the right operators on the entire space B, with the additional restriction that α<1. This leads to the following important result:

If α(0,1), then as operators on the space B,

  1. Rα=(IαP)1
  2. P=1α(IRα1)
Details:

The operators are bounded, so we can subtract. The identity I+αRαP=Rα leads to Rα(IαP)=I and the identity I+αPRα=Rα leads to (IαP)Rα=I. Hence (a) holds. Part (b) follows from (a).

Exercise [12] shows again that the potential operator Rα determines the transition operator P.

Examples and Applications

Our first example considers the binomial process as a Markov process.

Let I={In:nN+} be a sequence of Bernoulli Trials with success parameter p(0,1). Define the Markov process X={Xn:nN} by Xn=X0+k=1nIk where X0 takes values in N and is independent of I.

  1. For nN, show that the transition probability matrix Pn of X is given by Pn(x,y)=(nyx)pyx(1p)ny+x,xN,y{x,x+1,,x+n}
  2. For α(0,1], show that the potential matrix Rα of X is given by Rα(x,y)=11α+αp(αp1α+αp)yx,xN,y{x,x+1,}
  3. For α(0,1) and xN, identify the probability distribution defined by (1α)Rα(x,).
  4. For x,yN with xy, interpret R(x,y), the expected time in y starting in x, in the context of the process X.
Details:

Recall that X is a Markov process since it has stationary, independent increments.

  1. Note that for n,xN, Pn(x,) is the (discrete) PDF of x+k=1nIk. The result follows since the sum of the indicator variables has the binomial distribution with parameters n and p.
  2. Let α(0,1] and let x,yN with xy. Then Rα(x,y)=n=0αnPn(x,y)=n=yxαn(nyx)pyx(1p)ny+x=(αp)yxn=yx(nyx)[α(1p)]ny+x=(αp)yx[1α(1p)]nx+1 Simplifying gives the result.
  3. For α(0,1), (1α)Rα(x,y)=1α1α+αp(αp1α+αp)yx As a function of y for fixed x, this is the PDF of x+Yα where Yα has the geometric distribution on N with parameter 1α1α+αp.
  4. Note that R(x,y)=1/p for x,yN with xy. Starting in state x, the process eventually reaches y with probability 1. The process remains in state y for a geometrically distributed time, with parameter p. The mean of this distribution is 1/p.

Continuous Time

With the discrete-time setting as motivation, we now turn the more important continuous-time case where T=[0,).

Potential Kernels

For α[0,), the α-potential kernel Uα of X is defined as follows: Uα(x,A)=0eαtPt(x,A)dt,xS,AS

  1. The special case U=U0 is simply the potential kerenl of X.
  2. For xS and AS, U(x,A) is the expected amount of time that X spends in A, starting at x.
  3. The family of kernels U={Uα:α(0,)} is known as the reolvent of X.
Details:

Since P={Pt:tT} is a Feller semigroup of transition operators, the mapping (t,x)Pt(x,A) from [0,)×S to [0,1] is jointly measurable for AS. Thus, Uα(x,A) makes sense for xS and AS and xUα(x,A) from S to [0,) is measurable for AS. That AUα(x,A) is a measure on S follows from the usual interchange of sum and integral, via Fubini's theorem: Suppose that {Aj:jJ} is a countable collection of disjoint sets in S, and let S=jJAj Uα(x,A)=0eαtPt(x,A)dt=0[jJeαtPt(x,Aj)]dt=jJ0eαtPt(x,Aj)dt=jJUα(x,Aj) Finally, the interpretation of U(x,A) for xS and AS is another interchange of integrals: U(x,A)=0Pt(x,A)dt=0E[1(XtA)X0=x]dt=E(01(XtA)dt|X0=x) The inside integral is the Lebesgue measure of {t[0,):XtA}.

As with discrete time, it's quite possible that U(x,A)= for some xS and AS, and knowing when this is the case is of considerable interest. As with all kernels, the potential kernel Uα defines two operators, operating on the right on functions, and operating on the left on positive measures. If f:SR is measurable then, giving the right potential operator in its many forms, Uαf(x)=SUα(x,dy)f(y)=0eαtPtf(x)dt=0eαtSPt(x,dy)f(y)=0eαtE[f(Xt)X0=x]dt,xS assuming that the various integrals make sense. This will be the case in particular if f is nonnegative, or if fB and α>0.

If α>0, then Uα(x,S)=1α for all xS.

Details:

For xS, Uα(x,S)=0eαtPt(x,S)dt=0eαtdt=1α

It follows that for α(0,), the right potential operator Uα is a bounded, linear operator on B with Uα=1α. It also follows that αUα is a probability kernel. This kernel has a nice interpretation.

If α>0 then αUα(x,) is the conditional distribution of Xτ where τ is independent of X and has the exponential distribution on [0,) with parameter α.

Details:

Suppose that xS and AS. The random time τ has PDF f(t)=αeαt for t[0,). Hence, conditioning on τ gives P(XτAX0=x)=0αeαtP(XτAτ=t,X0=x)dt But by the substitution rule and the assumption of independence, P(XτAτ=t,X0=x)=P(XtAτ=t,X0=x)=P(XtAX0=x)=Pt(x,A) Substituting gives P(XτAX0=x)=0αeαtPt(x,A)dt=αUα(x,A)

So αUα is a transition probability kernel, just as Pt is a transition probability kernel, but corresponding to the random time τ (with α(0,) as a parameter), rather than the deterministic time t[0,). As in the discrete case, the potential kernel can also be interpreted in economic terms. Suppose that AS and that we receive money at a rate of one unit per unit time whenever the process X is in A. Then U(x,A) is the expected total amount of money that we receive, starting in state xS. But again, money that we receive later is of less value to us now than money that we will receive sooner. Specifically, suppose that one monetary unit at time t[0,) has a present value of eαt where α(0,) is the inflation factor or discount factor. The Uα(x,A) is the total, expected, discounted amount that we receive, starting in xS. A bit more generally, suppose that fB and that f(x) is the reward (or cost, depending on the sign) per unit time that we receive when the process is in state xS. Then Uαf(x) is the expected, total, discounted reward, starting in state xS.

For the left potential operator, if μ is a positive measure on S then μUα(A)=Sμ(dx)Uα(x,A)=0eαtμPt(A)dt=0eαt[Sμ(dx)Pt(x,A)]dt=0eαt[Sμ(dx)P(XtA)]dt,AS In particular, suppose that α>0 and that μ is a probability measure and X0 has distributionμ. Then μPt is the distribution of Xt for t[0,), and hence from the last result, αμUα is the distribution of Xτ, where again, τ is independent of X and has the exponential distribution on [0,) with parameter α. The family of potential kernels gives the same information as the family of transition kernels.

The resolvent U={Uα:α(0,)} completely determines the family of transition kernels P={Pt:t(0,)}.

Details:

Note that for xS and AS, the function αUα(x,A) on (0,) is the Laplace transform of the function tPt(x,A) on [0,). The Laplace transform of a function determines the function completely.

It follows that the resolvent {Uα:α[0,)}, along with the initial distribution, completely determine the finite dimensional distributions of the Markov process X. This is much more important here in the continuous-time case than in the discrete-time case, since the transition kernels Pt cannot be generated from a single transition kernel. The potential kernels commute with each other and with the transition kernels.

Suppose that α,β,t[0,). Then (as kernels),

  1. PtUα=UαPt=0eαsPs+tds
  2. UαUβ=00eαseβtPs+tdsdt
Details:

Suppose that fB is nonnegative. The interchanges of operators and integrals below are interchanges of integrals, and are justified since the integrands are nonnegative. The other tool used is the semigroup property of P={Pt:t[0,)}.

  1. Directly, UαPtf=0eαsPsPtfds=0eαsPs+tfds The other direction involves an interchange. PtUαf=Pt0eαsPsfds=0eαsPtPsfds=0eαsPs+tfds
  2. First UαUβf=0eαsPsUβfds=0eαsPs0eβtPtfdt=0eαs0eβtPsPtfdsdt=00eαseβtPs+tfdsdt The other direction is similar.

The same identities hold for the right operators on the entire space B under the additional restriction that α>0 and β>0. The fundamental equation that relates the potential kernels, known as the resolvent equation, is given in the next theorem:

If α,β[0,) with αβ then (as kernels) Uα=Uβ+(βα)UαUβ.

Details:

If α=β the equation is trivial, so assume α<β. Suppose that fB is nonnegative. From the previous result, UαUβf=00eαseβtPs+tfdtds The transformation u=s+t,v=s maps [0,)2 one-to-one onto {(u,v)[0,)2:uv}. The inverse transformation is s=v,t=uv with Jacobian 1. Hence we have UαUβf=00ueαveβ(uv)Pufdvdu=0(0ue(βα)vdv)eβuPufdu=1βα0[e(βα)u1]eβuPufdu=1βα(0eαuPufdu0eβuPufdu)=1βα(UαfUβf) Simplifying gives the result. Note that Uβf is finite since β>0.

The same identity holds for the right potential operators on the entire space B, under the additional restriction that α>0. For α(0,), Uα is also an operator on the space C0.

If α(0,) and fC0 then UαfC0.

Details:

Suppose that fC0 and that (x1,x2,) is a sequence in S. Then PtfC0 for t[0,). Hence if xnxS as n then eαtPtf(xn)eαtPtf(x) as n for each t[0,). By the dominated convergence theorem, Uαf(xn)=0eαtPtf(xn)dt0eαtPtf(x)dt=Uαf(x) as n Hence Uαf is continuous. Next suppose that xn as n. This means that for every compact CS, there exist mN+ such that xnC for n>m. Them eαtPtf(xn)0 as n for each t[0,). Again by the dominated convergence theorem, Uαf(xn)=0eαtPtf(xn)dt0 as n So UαfC0.

If fC0 then αUαff as α.

Details:

Convergence is with respect to the supremum norm on C0, of course. Suppose that fC0. Note first that with a change of variables s=αt, αUαf=0αeαtPtfdt=0esPs/αfds and hence |αUαff|=|0es(Ps/αff)ds|0es|Ps/αff|ds0esPs/αffds So it follows that αUαff0esPs/αffds But Ps/αff0 as α and hence by the dominated convergence theorem, 0esPs/αffds0 as α.

Infinitesimal Generator

In continuous time, it's not at all clear how we could construct a Markov process with desired properties, say to model a real system of some sort. Stated mathematically, the existential problem is how to construct the family of transition kernels {Pt:t[0,)} so that the semigroup property PsPt=Ps+t is satisfied for all s,t[0,). The answer, as for similar problems in the deterministic world, comes essentially from calculus, from a type of derivative.

The infinitesimal generator of the Markov process X is the operator G:DC0 defined by Gf=limt0Ptfft on the domain DC0 for which the limit exists.

As usual, the limit is with respect to the supremum norm on C0, so fD and Gf=g means that f,gC0 and Ptfftg=sup{|Ptf(x)f(x)tg(x)|:xS}0 as t0 So in particular, Gf(x)=limt0Ptf(x)f(x)t=limt0E[f(Xt)X0=x]f(x)t,xS

The domain D is a subspace of C0 and the generator G is a linear operator on D

  1. If fD and cR then cfD and G(cf)=cGf.
  2. If f,gD then f+gD and G(f+g)=Gf+Gg.
Details:

These are simple results that depend on the linearity of Pt for t[0,) and basic results on convergence.

  1. If fD then Pt(cf)(cf)t=cPtfftcGf as t0
  2. If f,gD then Pt(f+g)(f+g)t=Ptfft+PtggtGf+Gg as t0

Note G is the (right) derivative at 0 of the function tPtf. Because of the semigroup property, this differentiability property at 0 implies differentiability at arbitrary t[0,). Moreover, the infinitesimal operator and the transition operators commute:

If fD and t[0,), then PtfD and the following derivative rules hold with respect to the supremum norm.

  1. Ptf=PtGf, the Kolmogorov forward equation
  2. Ptf=GPtf, the Kolmogorov backward equation
Details:

Let fD. All limits and statements about derivatives and continuity are with respect to the supremum norm.

  1. By assumption, 1h(Phff)Gf as h0 Since Pt is a bounded, linear operator on the space C0, it preserves limits, so 1h(PtPhfPtf)=1h(Pt+hfPtf)PtGf as h0 This proves the result for the derivative from the right. But since tPtf is continuous, the the result is also true for the two-sided derivative.
  2. From part (a), we now know that 1h(PhPtfPtf)=1h(Pt+hfPtf)PtGf as h0 By definition, this means that PtfD and GPtf=PtGf=Ptf.

Exercise [24] gives a possible solution to the dilema that motivated this discussion in the first place. If we want to construct a Markov process with desired properties, to model a a real system for example, we can start by constructing an appropriate generator G and then solve the initial value problem Pt=GPt,P0=I to obtain the transition operators P={Pt:t[0,)}. The next theorem gives the relationship between the potential operators and the infinitesimal operator, which in some ways is better. This relationship is analogous to the relationship between the potential operators and the one-step operator in for discrete time

Suppose α(0,).

  1. If fD the GfC0 and f+UαGf=αUαf
  2. If fC0 then UαfD and f+GUαf=αUαf.
Details:
  1. By definition, if fD then GfC0. Hence using the previous result, f+UαGf=f+0eαtGPtfdt=f+0eαtPtfdt Integrating by parts (with u=eαt and dv=Ptfdt) gives f+GUαf=feαtPtf|0+α0eαtPtfdt But eαtPtf0 as t while P0f=f. The last term is αUαf.
  2. Suppose that fC0. From the result above and the substitution u=s+t, PtUαf=0eαsPs+tfds=teα(ut)Pufdu=eαtteαuPufdu Hence PtUαfUαft=1t[eαtteαuPufduUαf] Adding and subtracting eαuUαf and combining integrals gives PtUαfUαft=1t[eαtteαuPufdueαt0eαuPufdu]+eαt1tUαf=eαt1t0teαsPsfds+eαt1tUαf Since sPsf is continuous, the first term converges to f as t0. The second term converges to αUαf as t0.

For α>0, the operators Uα and G have an inverse relationship.

Suppose again that α(0,).

  1. Uα=(αIG)1:C0D
  2. G=αIUα1:DC0
Details:

Recall that Uα:C0D and G:DC0

  1. By part(a) the previous result we have αUαUαG=I so Uα(αIG)=I. By part (b) we have αUαGUα=I so (αIG)Uα=I.
  2. This follows from (a).

So, from the generator G we can determine the potential operators U={Uα:α(0,)}, which in turn determine the transition operators P={Pt:t(0,)}. In continuous time, transition operators P={Pt:t[0,)} can be obtained from the single, infinitesimal operator G in a way that is reminiscent of the fact that in discrete time, the transition operators P={Pn:nN} can be obtained from the single, one-step operator P.

Examples and Applications

Our first example is essentially deterministic.

Consider the Markov process X={Xt:t[0,)} on R satisfying the ordinary differential equation ddtXt=g(Xt),t[0,) where g:RR is Lipschitz continuous. The infinitesimal operator G is given by Gf(x)=f(x)g(x) for xR on the domain D of functions f:RR where fC0 and fC0.

Details:

Recall that the only source of randomness in this process is the initial sate X0. By the continuity assumptions on g, there exists a unique solution Xt(x) to the differential equation with initial value X0=x, defined for all t[0,). The transition operator Pt for t[0,) is defined on B by Ptf(x)=f[Xt(x)] for xR. By the ordinary chain rule, if f is differentiable, Ptf(x)f(x)t=f[Xt(x)]f(x)tf(x)g(x) as t0

Our next example considers the Poisson process as a Markov process. Compare this with the binomial process in [13].

Let N={Nt:t[0,)} denote the Poisson process on N with rate β(0,). Define the Markov process X={Xt:t[0,)} by Xt=X0+Nt where X0 takes values in N and is independent of N.

  1. For t[0,), show that the probability transition matrix Pt of X is given by Pt(x,y)=eβt(βt)yx(yx)!,x,yN,yx
  2. For α[0,), show that the potential matrix Uα of X is given by Uα(x,y)=1α+β(βα+β)yx,x,yN,yx
  3. For α>0 and xN, identify the probability distribution defined by αUα(x,).
  4. Show that the infinitesimal matrix G of X is given by G(x,x)=β, G(x,x+1)=β for xN.
Details:
  1. Note that for t[0,) and xN, Pt(x,) is the (discrete) PDF of x+Nt since Nt has the Poisson distribution with parameter βt.
  2. Let α[0,) and let x,yN with xy. Then Uα(x,y)=0eαtPt(x,y)dt=0eαteβt(βt)yx(yx)!dt=βyx(yx)!0e(α+β)ttyxdt The change of variables s=(α+β)t gives Uα(x,y)=βyx(yx)!(α+β)yx+10essyxds But the last integral is Γ(yx+1)=(yx)!. Simplifying gives the result.
  3. For α>0, αUα(x,y)=αα+β(βα+β)yx,x,yN,yx As a function of y for fixed x, this is the PDF of x+Yα where Yα has the geometric distribution with parameter αα+β.
  4. Note that for x,yN, G(x,y)=ddtPt(x,y)|t=0. By simple calculus, this is β if y=x, β if y=x+1, and 0 otherwise.