Our goal in this section is to continue the broad sketch of the general theory of Markov processes. As with the last section, some of the statements are not completely precise and rigorous, because we want to focus on the main ideas without being overly burdened by technicalities. If you are a new student of probability, or are primarily interested in applications, you may want to skip ahead to the study of discrete-time Markov chains.
Preliminaries
Basic Definitions
As usual, our starting point is a probability space , so that is the set of outcomes, the -algebra of events, and the probability measure on the sample space . The set of times is either , discrete time with the discrete topology, or , continuous time with the usual Euclidean topology. The time set is given the Borel -algebra , which is just the power set if , and then the time space is given the usual measure, counting measure in the discrete case and Lebesgue measure in the continuous case. The set of states has an LCCB topology (locally compact, Hausdorff, with a countable base), and is also given the Borel -algebra . Recall that to say that the state space is discrete means that is countable with the discrete topology, so that is the power set of . The topological assumptions mean that the state space is nice enough for a rich mathematical theory and general enough to encompass the most important applications. There is often a natural Borel measure on , counting measure if is discrete, and for example, Lebesgue measure if for some .
Recall also that there are several spaces of functions on that are important. Let denote the set of bounded, measurable functions . Let denote the set of bounded, continuous functions , and let denote the set of continuous functions that vanish at in the sense that for every , there exists a compact set such for . These are all vector spaces under the usual (pointwise) addition and scalar multiplication, and . The supremum norm, defined by for is the norm that is used on these spaces.
Suppose now that is a time-homogeneous Markov process with state space defined on the probability space . As before, we also assume that we have a filtration , that is, an increasing family of sub -algebras of , indexed by the time space, with the properties that is measurable with repsect to for . Intuitively, is the collection of events up to time .
As usual, we let denote the transition probability kernel for an increase in time of size . Thus
Recall that for , the transition kernel defines two operators, on the left with measures and on the right with functions. So, if is a measure on then is the measure on given by
If is the distribution of then is the distribution of for . If then is defined by
Recall that the collection of transition operators is a semigroup because for . Just about everything in this section is defined in terms of the semigroup , which is one of the main analytic tools in the study of Markov processes.
Feller Markov Processes
We make the same assumptions as in the Introduction. Here is a brief review:
We assume that the Markov process satisfies the following properties (and hence is a Feller Markov process):
- For and , the distribution of given converges to the distribution of given as .
- Given , converges in probability to as .
Part (a) is an assumption on continuity in space, while part (b) is an assumption on continuity in time. If is discrete then (a) automatically holds, and if is discrete then (b) automatically holds. As we will see, the Feller assumptions are sufficient for a very nice mathematical theory, and yet are general enough to encompass the most important continuous-time Markov processes.
The process has the following properties:
- There is a version of such that is continuous from the right and has left limits.
- is a strong Markov process relative to the , the right-continuous refinement of the natural filtration.
The Feller assumptions on the Markov process have equivalent formulations in terms of the transition semigroup.
The transition semigroup has the following properties:
- If and then
- If and then as .
As before, part (a) is a condition on continuity in space, while part (b) is a condition on continuity in time. Once again, (a) is trivial if is discrete, and (b) trivial if is discrete. The first condition means that is a linear operator on (as well as being a linear operator on ). The second condition leads to a stronger continuity result.
For , the mapping is continuous on . That is, for ,
Our interest in this section is primarily the continuous time case. However, we start with the discrete time case since the concepts are clearer and simpler, and we can avoid some of the technicalities that inevitably occur in continuous time.
Discrete Time
Suppose that , so that time is discrete. Recall that the transition kernels are just powers of the one-step kernel. That is, we let and then for .
Potential Operators
For , the -potential kernel of is defined as follows:
- The special case is simply the potential kernel of .
- For and , is the expected number of visits of to , starting at .
Details:
The function from to is measurable for since is measurable for each . The mapping is a positive measure on for since is a probability measure for each . Finally, the interpretation of for and comes from interchanging sum and expected value, which is allowed since the terms are nonnegative:
Note that it's quite possible that for some and . In fact, knowing when this is the case is of considerable importance in the study of Markov processes. As with all kernels, the potential kernel defines two operators, operating on the right on functions, and operating on the left on positive measures. For the right potential operator, if is measurable then
assuming as usual that the expected values and the infinite series make sense. This will be the case, in particular, if is nonnegative or if and .
If , then for all .
Details:
Using geometric series,
It follows that for , the right operator is a bounded, linear operator on with . It also follows that is a probability kernel. There is a nice interpretation of this kernel.
If then is the conditional distribution of given , where is independent of and has the geometric distribution on with parameter .
Details:
Suppose that and . Conditioning on gives
But by the substitution rule and the assumption of independence,
Since has the geometric distribution on with parameter we have for . Substituting gives
So is a transition probability kernel, just as is a transition probability kernel, but corresponding to the random time (with as a parameter), rather than the deterministic time . An interpretation of the potential kernel for can be also given in economic terms. Suppose that and that we receive one monetary unit each time the process visits . Then as above, is the expected total amount of money we receive, starting at . However, typically money that we will receive at times distant in the future has less value to us now than money that we will receive soon. Specifically suppose that a monetary unit received at time has a present value of , where is an inflation factor (sometimes also called a discount factor). Then gives the expected, total, discounted amount we will receive, starting at . A bit more generally, if is a reward function, so that is the reward (or cost, depending on the sign) that we receive when we visit state , then for , is the expected, total, discounted reward, starting at .
For the left potential operator, if is a positive measure on then
In particular, if is a probability measure and has distribution then is the distribution of for , so from the last result, is the distribution of where again, is independent of and has the geometric distribution on with parameter . The family of potential kernels gives the same information as the family of transition kernels.
The potential kernels completely determine the transition kernels .
Details:
Note that for and , the function is a power series in with coefficients . In the language of combinatorics, is the ordinary generating function of the sequence . As noted above, this power series has radius of convergence at least 1, so we can extend the domain to . Thus, given the potential kernels, we can recover the transition kernels by taking derivatives and evaluating at 0:
Of course, it's really only necessary to determine , the one step transition kernel, since the other transition kernels are powers of . In any event, it follows that the kernels , along with the initial distribution, completely determine the finite dimensional distributions of the Markov process . The potential kernels commute with each other and with the transition kernels.
Suppose that and . Then (as kernels)
Details:
Suppose that is nonnegative. The interchange of the sums with the kernel operation is allowed since the kernels are nonnegative. The other tool used is the semigroup property.
- Directly
The other direction requires an interchange.
- First,
The other direction is similar.
The same identities hold for the right operators on the entire space , with the additional restrictions that and . The fundamental equation that relates the potential kernels is given next.
If with then (as kernels),
Details:
If the equation is trivial, so assume . Suppose that is nonnegative. From the previous result,
Changing variables to sum over and gives
Simplifying gives
Note that since , is a finite, so we don't have to worry about the dreaded indeterminate form .
The same identity holds holds for the right operators on the entire space , with the additional restriction that .
If , then (as kernels), .
Details:
Suppose that is nonnegative. From the result above,
The same identity holds for the right operators on the entire space , with the additional restriction that . This leads to the following important result:
If , then as operators on the space ,
Details:
The operators are bounded, so we can subtract. The identity leads to and the identity leads to . Hence (a) holds. Part (b) follows from (a).
Exercise [12] shows again that the potential operator determines the transition operator .
Examples and Applications
Our first example considers the binomial process as a Markov process.
Let be a sequence of Bernoulli Trials with success parameter . Define the Markov process by where takes values in and is independent of .
- For , show that the transition probability matrix of is given by
- For , show that the potential matrix of is given by
- For and , identify the probability distribution defined by .
- For with , interpret , the expected time in starting in , in the context of the process .
Details:
Recall that is a Markov process since it has stationary, independent increments.
- Note that for , is the (discrete) PDF of . The result follows since the sum of the indicator variables has the binomial distribution with parameters and .
- Let and let with . Then
Simplifying gives the result.
- For ,
As a function of for fixed , this is the PDF of where has the geometric distribution on with parameter .
- Note that for with . Starting in state , the process eventually reaches with probability 1. The process remains in state for a geometrically distributed time, with parameter . The mean of this distribution is .
Continuous Time
With the discrete-time setting as motivation, we now turn the more important continuous-time case where .
Potential Kernels
For , the -potential kernel of is defined as follows:
- The special case is simply the potential kerenl of .
- For and , is the expected amount of time that spends in , starting at .
- The family of kernels is known as the reolvent of .
Details:
Since is a Feller semigroup of transition operators, the mapping from to is jointly measurable for . Thus, makes sense for and and from to is measurable for . That is a measure on follows from the usual interchange of sum and integral, via Fubini's theorem: Suppose that is a countable collection of disjoint sets in , and let
Finally, the interpretation of for and is another interchange of integrals:
The inside integral is the Lebesgue measure of .
As with discrete time, it's quite possible that for some and , and knowing when this is the case is of considerable interest. As with all kernels, the potential kernel defines two operators, operating on the right on functions, and operating on the left on positive measures. If is measurable then, giving the right potential operator in its many forms,
assuming that the various integrals make sense. This will be the case in particular if is nonnegative, or if and .
If , then for all .
Details:
For ,
It follows that for , the right potential operator is a bounded, linear operator on with . It also follows that is a probability kernel. This kernel has a nice interpretation.
If then is the conditional distribution of where is independent of and has the exponential distribution on with parameter .
Details:
Suppose that and . The random time has PDF for . Hence, conditioning on gives
But by the substitution rule and the assumption of independence,
Substituting gives
So is a transition probability kernel, just as is a transition probability kernel, but corresponding to the random time (with as a parameter), rather than the deterministic time . As in the discrete case, the potential kernel can also be interpreted in economic terms. Suppose that and that we receive money at a rate of one unit per unit time whenever the process is in . Then is the expected total amount of money that we receive, starting in state . But again, money that we receive later is of less value to us now than money that we will receive sooner. Specifically, suppose that one monetary unit at time has a present value of where is the inflation factor or discount factor. The is the total, expected, discounted amount that we receive, starting in . A bit more generally, suppose that and that is the reward (or cost, depending on the sign) per unit time that we receive when the process is in state . Then is the expected, total, discounted reward, starting in state .
For the left potential operator, if is a positive measure on then
In particular, suppose that and that is a probability measure and has distribution. Then is the distribution of for , and hence from the last result, is the distribution of , where again, is independent of and has the exponential distribution on with parameter . The family of potential kernels gives the same information as the family of transition kernels.
The resolvent completely determines the family of transition kernels .
Details:
Note that for and , the function on is the Laplace transform of the function on . The Laplace transform of a function determines the function completely.
It follows that the resolvent , along with the initial distribution, completely determine the finite dimensional distributions of the Markov process . This is much more important here in the continuous-time case than in the discrete-time case, since the transition kernels cannot be generated from a single transition kernel. The potential kernels commute with each other and with the transition kernels.
Suppose that . Then (as kernels),
Details:
Suppose that is nonnegative. The interchanges of operators and integrals below are interchanges of integrals, and are justified since the integrands are nonnegative. The other tool used is the semigroup property of .
- Directly,
The other direction involves an interchange.
- First
The other direction is similar.
The same identities hold for the right operators on the entire space under the additional restriction that and . The fundamental equation that relates the potential kernels, known as the resolvent equation, is given in the next theorem:
If with then (as kernels) .
Details:
If the equation is trivial, so assume . Suppose that is nonnegative. From the previous result,
The transformation maps one-to-one onto . The inverse transformation is with Jacobian . Hence we have
Simplifying gives the result. Note that is finite since .
The same identity holds for the right potential operators on the entire space , under the additional restriction that . For , is also an operator on the space .
If and then .
Details:
Suppose that and that is a sequence in . Then for . Hence if as then as for each . By the dominated convergence theorem,
Hence is continuous. Next suppose that as . This means that for every compact , there exist such that for . Them as for each . Again by the dominated convergence theorem,
So .
If then as .
Details:
Convergence is with respect to the supremum norm on , of course. Suppose that . Note first that with a change of variables ,
and hence
So it follows that
But as and hence by the dominated convergence theorem, as .
Infinitesimal Generator
In continuous time, it's not at all clear how we could construct a Markov process with desired properties, say to model a real system of some sort. Stated mathematically, the existential problem is how to construct the family of transition kernels so that the semigroup property is satisfied for all . The answer, as for similar problems in the deterministic world, comes essentially from calculus, from a type of derivative.
The infinitesimal generator of the Markov process is the operator defined by
on the domain for which the limit exists.
As usual, the limit is with respect to the supremum norm on , so and means that and
So in particular,
The domain is a subspace of and the generator is a linear operator on
- If and then and .
- If then and .
Details:
These are simple results that depend on the linearity of for and basic results on convergence.
- If then
- If then
Note is the (right) derivative at 0 of the function . Because of the semigroup property, this differentiability property at implies differentiability at arbitrary . Moreover, the infinitesimal operator and the transition operators commute:
If and , then and the following derivative rules hold with respect to the supremum norm.
- , the Kolmogorov forward equation
- , the Kolmogorov backward equation
Details:
Let . All limits and statements about derivatives and continuity are with respect to the supremum norm.
- By assumption,
Since is a bounded, linear operator on the space , it preserves limits, so
This proves the result for the derivative from the right. But since is continuous, the the result is also true for the two-sided derivative.
- From part (a), we now know that
By definition, this means that and .
Exercise [24] gives a possible solution to the dilema that motivated this discussion in the first place. If we want to construct a Markov process with desired properties, to model a a real system for example, we can start by constructing an appropriate generator and then solve the initial value problem
to obtain the transition operators . The next theorem gives the relationship between the potential operators and the infinitesimal operator, which in some ways is better. This relationship is analogous to the relationship between the potential operators and the one-step operator in for discrete time
Suppose .
- If the and
- If then and .
Details:
- By definition, if then . Hence using the previous result,
Integrating by parts (with and ) gives
But as while . The last term is .
- Suppose that . From the result above and the substitution ,
Hence
Adding and subtracting and combining integrals gives
Since is continuous, the first term converges to as . The second term converges to as .
For , the operators and have an inverse relationship.
Suppose again that .
Details:
Recall that and
- By part(a) the previous result we have so . By part (b) we have so .
- This follows from (a).
So, from the generator we can determine the potential operators , which in turn determine the transition operators . In continuous time, transition operators can be obtained from the single, infinitesimal operator in a way that is reminiscent of the fact that in discrete time, the transition operators can be obtained from the single, one-step operator .
Examples and Applications
Our first example is essentially deterministic.
Consider the Markov process on satisfying the ordinary differential equation
where is Lipschitz continuous. The infinitesimal operator is given by for on the domain of functions where and .
Details:
Recall that the only source of randomness in this process is the initial sate . By the continuity assumptions on , there exists a unique solution to the differential equation with initial value , defined for all . The transition operator for is defined on by for . By the ordinary chain rule, if is differentiable,
Our next example considers the Poisson process as a Markov process. Compare this with the binomial process in [13].
Let denote the Poisson process on with rate . Define the Markov process by where takes values in and is independent of .
- For , show that the probability transition matrix of is given by
- For , show that the potential matrix of is given by
- For and , identify the probability distribution defined by .
- Show that the infinitesimal matrix of is given by , for .
Details:
- Note that for and , is the (discrete) PDF of since has the Poisson distribution with parameter .
- Let and let with . Then
The change of variables gives
But the last integral is . Simplifying gives the result.
- For ,
As a function of for fixed , this is the PDF of where has the geometric distribution with parameter .
- Note that for , . By simple calculus, this is if , if , and 0 otherwise.