Potentials and Generators

Our goal in this section is to continue the broad sketch of the general theory of Markov processes. As with the last section, some of the statements are not completely precise and rigorous, because we want to focus on the main ideas without being overly burdened by technicalities. If you are a new student of probability, or are primarily interested in applications, you may want to skip ahead to the study of discrete-time Markov chains.

Preliminaries

Basic Definitions

As usual, our starting point is a probability space

(Ω, F, P)

, so that

Ω

is the set of outcomes,

F

the

σ

-algebra of events, and

P

the probability measure on the sample space

(Ω, F)

. The set of times

T

is either

N

, discrete time with the discrete topology, or

[0, \infty)

, continuous time with the usual Euclidean topology. The time set

T

is given the Borel

σ

-algebra

T

, which is just the power set if

T = N

, and then the time space

(T, T)

is given the usual measure, counting measure in the discrete case and Lebesgue measure in the continuous case. The set of states

S

has an LCCB topology (locally compact, Hausdorff, with a countable base), and is also given the Borel

σ

-algebra

S

. Recall that to say that the state space is discrete means that

S

is countable with the discrete topology, so that

S

is the power set of

S

. The topological assumptions mean that the state space

(S, S)

is nice enough for a rich mathematical theory and general enough to encompass the most important applications. There is often a natural Borel measure

λ

(S, S)

, counting measure

#

S

is discrete, and for example, Lebesgue measure if

S = R^{k}

for some

k \in N_{+}

Recall also that there are several spaces of functions on

S

that are important. Let

B

denote the set of bounded, measurable functions

f : S \to R

. Let

C

denote the set of bounded, continuous functions

f : S \to R

, and let

C_{0}

denote the set of continuous functions

f : S \to R

that vanish at

\infty

in the sense that for every

ϵ > 0

, there exists a compact set

K \subseteq S

such

| f (x) | < ϵ

for

x \in K^{c}

. These are all vector spaces under the usual (pointwise) addition and scalar multiplication, and

C_{0} \subseteq C \subseteq B

. The supremum norm, defined by

∥ f ∥ = sup {| f (x) | : x \in S}

for

f \in B

is the norm that is used on these spaces.

Suppose now that

X = {X_{t} : t \in T}

is a time-homogeneous Markov process with state space

(S, S)

defined on the probability space

(Ω, F, P)

. As before, we also assume that we have a filtration

F = {F_{t} : t \in T}

, that is, an increasing family of sub

σ

-algebras of

F

, indexed by the time space, with the properties that

X_{t}

is measurable with repsect to

F_{t}

for

t \in T

. Intuitively,

F_{t}

is the collection of events up to time

t \in T

As usual, we let

P_{t}

denote the transition probability kernel for an increase in time of size

t \in T

. Thus

P_{t} (x, A) = P (X_{t} \in A ∣ X_{0} = x), x \in S, A \in S

Recall that for

t \in T

, the transition kernel

P_{t}

defines two operators, on the left with measures and on the right with functions. So, if

μ

is a measure on

(S, S)

then

μ P_{t}

is the measure on

(S, S)

given by

μ P_{t} (A) = \int_{S} μ (d x) P_{t} (x, A), A \in S

μ

is the distribution of

X_{0}

then

μ P_{t}

is the distribution of

X_{t}

for

t \in T

. If

f \in B

then

P_{t} f \in B

is defined by

P_{t} f (x) = \int_{S} P_{t} (x, d y) f (y) = E [f (X_{t}) ∣ X_{0} = x]

Recall that the collection of transition operators

P = {P_{t} : t \in T}

is a semigroup because

P_{s} P_{t} = P_{s + t}

for

s, t \in T

. Just about everything in this section is defined in terms of the semigroup

P

, which is one of the main analytic tools in the study of Markov processes.

Feller Markov Processes

We assume that the Markov process $X = {X_{t} : t \in T}$ satisfies the following properties (and hence is a Feller Markov process):

For $t \in T$ and $y \in S$ , the distribution of $X_{t}$ given $X_{0} = x$ converges to the distribution of $X_{t}$ given $X_{0} = y$ as $x \to y$ .
Given $X_{0} = x \in S$ , $X_{t}$ converges in probability to $x$ as $t ↓ 0$ .

Part (a) is an assumption on continuity in space, while part (b) is an assumption on continuity in time. If

S

is discrete then (a) automatically holds, and if

T

is discrete then (b) automatically holds. As we will see, the Feller assumptions are sufficient for a very nice mathematical theory, and yet are general enough to encompass the most important continuous-time Markov processes.

The process $X = {X_{t} : t \in T}$ has the following properties:

There is a version of $X$ such that $t \mapsto X_{t}$ is continuous from the right and has left limits.
$X$ is a strong Markov process relative to the $F_{+}^{0}$ , the right-continuous refinement of the natural filtration.

The Feller assumptions on the Markov process have equivalent formulations in terms of the transition semigroup.

The transition semigroup $P = {P_{t} : t \in T}$ has the following properties:

If $f \in C_{0}$ and $t \in T$ then $P_{t} f \in C_{0}$
If $f \in C_{0}$ and $x \in S$ then $P_{t} f (x) \to f (x)$ as $t ↓ 0$ .

As before, part (a) is a condition on continuity in space, while part (b) is a condition on continuity in time. Once again, (a) is trivial if

S

is discrete, and (b) trivial if

T

is discrete. The first condition means that

P_{t}

is a linear operator on

C_{0}

(as well as being a linear operator on

B

). The second condition leads to a stronger continuity result.

For $f \in C_{0}$ , the mapping $t \mapsto P_{t} f$ is continuous on $T$ . That is, for $t \in T$ , $∥ P_{s} f - P_{t} f ∥ = sup {| P_{s} f (x) - P_{t} f (x) | : x \in S} \to 0 as s \to t$

Our interest in this section is primarily the continuous time case. However, we start with the discrete time case since the concepts are clearer and simpler, and we can avoid some of the technicalities that inevitably occur in continuous time.

Discrete Time

Suppose that

T = N

, so that time is discrete. Recall that the transition kernels are just powers of the one-step kernel. That is, we let

P = P_{1}

and then

P_{n} = P^{n}

for

n \in N

Potential Operators

For $α \in (0, 1]$ , the $α$ -potential kernel $R_{α}$ of $X$ is defined as follows: $R_{α} (x, A) = \sum_{n = 0}^{\infty} α^{n} P^{n} (x, A), x \in S, A \in S$

The special case $R = R_{1}$ is simply the potential kernel of $X$ .
For $x \in S$ and $A \in S$ , $R (x, A)$ is the expected number of visits of $X$ to $A$ , starting at $x$ .

Details:

The function $x \mapsto R_{α} (x, A)$ from $S$ to $[0, \infty)$ is measurable for $A \in S$ since $x \mapsto P^{n} (x, A)$ is measurable for each $n \in N$ . The mapping $A \mapsto R_{α} (x, A)$ is a positive measure on $S$ for $x \in S$ since $A \mapsto P^{n} (x, A)$ is a probability measure for each $n \in N$ . Finally, the interpretation of $R (x, A)$ for $x \in S$ and $A \in S$ comes from interchanging sum and expected value, which is allowed since the terms are nonnegative: $R (x, A) = \sum_{n = 0}^{\infty} P^{n} (x, A) = \sum_{n = 0}^{\infty} E [1 (X_{n} \in A) ∣ X_{0} = x] = E (\sum_{n = 0}^{\infty} 1 (X_{n} \in A) | X_{0} = x) = E [# {n \in N : X_{n} \in A} ∣ X_{0} = x]$

Note that it's quite possible that

R (x, A) = \infty

for some

x \in S

and

A \in S

. In fact, knowing when this is the case is of considerable importance in the study of Markov processes. As with all kernels, the potential kernel

R_{α}

defines two operators, operating on the right on functions, and operating on the left on positive measures. For the right potential operator, if

f : S \to R

is measurable then

R_{α} f (x) = \sum_{n = 0}^{\infty} α^{n} P^{n} f (x) = \sum_{n = 0}^{\infty} α^{n} \int_{S} P^{n} (x, d y) f (y) = \sum_{n = 0}^{\infty} α^{n} E [f (X_{n}) ∣ X_{0} = x], x \in S

assuming as usual that the expected values and the infinite series make sense. This will be the case, in particular, if

f

is nonnegative or if

p \in (0, 1)

and

f \in B

If $α \in (0, 1)$ , then $R_{α} (x, S) = \frac{1}{1 - α}$ for all $x \in S$ .

Details:

Using geometric series, $R_{α} (x, S) = \sum_{n = 0}^{\infty} α^{n} P^{n} (x, S) = \sum_{n = 0}^{\infty} α^{n} = \frac{1}{1 - α}$

It follows that for

α \in (0, 1)

, the right operator

R_{α}

is a bounded, linear operator on

B

with

∥ R_{α} ∥ = \frac{1}{1 - α}

. It also follows that

(1 - α) R_{α}

is a probability kernel. There is a nice interpretation of this kernel.

If $α \in (0, 1)$ then $(1 - α) R_{α} (x, \cdot)$ is the conditional distribution of $X_{N}$ given $X_{0} = x \in S$ , where $N$ is independent of $X$ and has the geometric distribution on $N$ with parameter $1 - α$ .

Details:

Suppose that $x \in S$ and $A \in S$ . Conditioning on $N$ gives $P (X_{N} \in A ∣ X_{0} = x) = \sum_{n = 0}^{\infty} P (N = n) P (X_{N} \in A ∣ N = n, X_{0} = x)$ But by the substitution rule and the assumption of independence, $P (X_{N} \in A ∣ N = n, X_{0} = x) = P (X_{n} \in A ∣ N = n, X_{0} = x) = P (X_{n} \in A ∣ X_{0} = x) = P^{n} (x, A)$ Since $N$ has the geometric distribution on $N$ with parameter $1 - α$ we have $P (N = n) = (1 - α) α^{n}$ for $n \in N$ . Substituting gives $P (X_{N} \in A ∣ X_{0} = x) = \sum_{n = 0}^{\infty} (1 - α) α^{n} P^{n} (x, A) = (1 - α) R_{α} (x, A)$

(1 - α) R_{α}

is a transition probability kernel, just as

P_{n}

is a transition probability kernel, but corresponding to the random time

N

(with

α \in (0, 1)

as a parameter), rather than the deterministic time

n \in N

. An interpretation of the potential kernel

R_{α}

for

α \in (0, 1)

can be also given in economic terms. Suppose that

A \in S

and that we receive one monetary unit each time the process

X

visits

A

. Then as above,

R (x, A)

is the expected total amount of money we receive, starting at

x \in S

. However, typically money that we will receive at times distant in the future has less value to us now than money that we will receive soon. Specifically suppose that a monetary unit received at time

n \in N

has a present value of

α^{n}

, where

α \in (0, 1)

is an inflation factor (sometimes also called a discount factor). Then

R_{α} (x, A)

gives the expected, total, discounted amount we will receive, starting at

x \in S

. A bit more generally, if

f \in B

is a reward function, so that

f (x)

is the reward (or cost, depending on the sign) that we receive when we visit state

x \in S

, then for

α \in (0, 1)

R_{α} f (x)

is the expected, total, discounted reward, starting at

x \in S

For the left potential operator, if

μ

is a positive measure on

S

then

μ R_{α} (A) = \sum_{n = 0}^{\infty} α^{n} μ P^{n} (A) = \sum_{n = 0}^{\infty} α^{n} \int_{S} μ (d x) P^{n} (x, A), A \in S

In particular, if

μ

is a probability measure and

X_{0}

has distribution

μ

then

μ P^{n}

is the distribution of

X_{n}

for

n \in N

, so from the last result,

(1 - α) μ R_{α}

is the distribution of

X_{N}

where again,

N

is independent of

X

and has the geometric distribution on

N

with parameter

1 - α

. The family of potential kernels gives the same information as the family of transition kernels.

The potential kernels $R = {R_{α} : α \in (0, 1)}$ completely determine the transition kernels $P = {P_{n} : n \in N}$ .

Details:

Note that for $x \in S$ and $A \in S$ , the function $α \mapsto R_{α} (x, A)$ is a power series in $α$ with coefficients $n \mapsto P^{n} (x, A)$ . In the language of combinatorics, $α \mapsto R_{α} (x, A)$ is the ordinary generating function of the sequence $n \mapsto P^{n} (x, A)$ . As noted above, this power series has radius of convergence at least 1, so we can extend the domain to $α \in (- 1, 1)$ . Thus, given the potential kernels, we can recover the transition kernels by taking derivatives and evaluating at 0: $P^{n} (x, A) = \frac{1}{n!} {[\frac{d^{n}}{d α^{n}} R_{α} (x, A)]}_{α = 0}$

Of course, it's really only necessary to determine

P

, the one step transition kernel, since the other transition kernels are powers of

P

. In any event, it follows that the kernels

R = {R_{α} : α \in (0, 1)}

, along with the initial distribution, completely determine the finite dimensional distributions of the Markov process

X

. The potential kernels commute with each other and with the transition kernels.

Suppose that $α, β \in (0, 1]$ and $k \in N$ . Then (as kernels)

$P^{k} R_{α} = R_{α} P^{k} = \sum_{n = 0}^{\infty} α^{n} P^{n + k}$
$R_{α} R_{β} = R_{β} R_{α} = \sum_{m = 0}^{\infty} \sum_{n = 0}^{\infty} α^{m} β^{n} P^{m + n}$

Details:

Suppose that $f \in B$ is nonnegative. The interchange of the sums with the kernel operation is allowed since the kernels are nonnegative. The other tool used is the semigroup property.

Directly $R_{α} P^{k} f = \sum_{n = 0}^{\infty} α^{n} P^{n} P^{k} f = \sum_{n = 0}^{\infty} α^{n} P^{n + k} f$ The other direction requires an interchange. $P^{k} R_{α} f = P^{k} \sum_{n = 0}^{\infty} α^{n} P^{n} f = \sum_{n = 0}^{\infty} α^{n} P^{k} P^{n} f = \sum_{n = 0}^{\infty} α^{n} P^{n + k} f$
First, $R_{α} R_{β} f = \sum_{m = 0}^{\infty} α^{m} P^{m} R_{β} f = \sum_{m = 0}^{\infty} α^{m} P^{m} (\sum_{n = 0}^{\infty} β^{n} P^{n} f) = \sum_{m = 0}^{\infty} \sum_{n = 0}^{\infty} α^{m} β^{n} P^{m} P^{n} f = \sum_{m = 0}^{\infty} \sum_{n = 0}^{\infty} α^{m} β^{n} P^{m + n} f$ The other direction is similar.

The same identities hold for the right operators on the entire space

B

, with the additional restrictions that

α < 1

and

β < 1

. The fundamental equation that relates the potential kernels is given next.

If $α, β \in (0, 1]$ with $α \leq β$ then (as kernels), $β R_{β} = α R_{α} + (β - α) R_{α} R_{β}$

Details:

If $α = β$ the equation is trivial, so assume $α < β$ . Suppose that $f \in B$ is nonnegative. From the previous result, $R_{α} R_{β} f = \sum_{j = 0}^{\infty} \sum_{k = 0}^{\infty} α^{j} β^{k} P^{j + k} f$ Changing variables to sum over $n = j + k$ and $j$ gives $R_{α} R_{β} f = \sum_{n = 0}^{\infty} \sum_{j = 0}^{n} α^{j} β^{n - j} P^{n} f = \sum_{n = 0}^{\infty} \sum_{j = 0}^{n} {(\frac{α}{β})}^{j} β^{n} P^{n} f = \sum_{n = 0}^{\infty} \frac{1 - {(\frac{α}{β})}^{n + 1}}{1 - \frac{α}{β}} β^{n} P^{n} f$ Simplifying gives $R_{α} R_{β} f = \frac{1}{β - α} (β R_{β} f - α R_{α} f)$ Note that since $α < 1$ , $R_{α} f$ is a finite, so we don't have to worry about the dreaded indeterminate form $\infty - \infty$ .

The same identity holds holds for the right operators on the entire space

B

, with the additional restriction that

β < 1

If $α \in (0, 1]$ , then (as kernels), $I + α R_{α} P = I + α P R_{α} = R_{α}$ .

Details:

Suppose that $f \in B$ is nonnegative. From the result above, $(I + α R_{α} P) f = (I + α P R_{α}) f = f + \sum_{n = 0}^{\infty} α^{n + 1} P^{n + 1} f = \sum_{n = 0}^{\infty} α^{n} P^{n} f = R_{α} f$

The same identity holds for the right operators on the entire space

B

, with the additional restriction that

α < 1

. This leads to the following important result:

If $α \in (0, 1)$ , then as operators on the space $B$ ,

$R_{α} = (I - α P)^{- 1}$
$P = \frac{1}{α} (I - R_{α}^{- 1})$

Details:

The operators are bounded, so we can subtract. The identity $I + α R_{α} P = R_{α}$ leads to $R_{α} (I - α P) = I$ and the identity $I + α P R_{α} = R_{α}$ leads to $(I - α P) R_{α} = I$ . Hence (a) holds. Part (b) follows from (a).

Exercise [12] shows again that the potential operator

R_{α}

determines the transition operator

P

Examples and Applications

Let $I = {I_{n} : n \in N_{+}}$ be a sequence of Bernoulli Trials with success parameter $p \in (0, 1)$ . Define the Markov process $X = {X_{n} : n \in N}$ by $X_{n} = X_{0} + \sum_{k = 1}^{n} I_{k}$ where $X_{0}$ takes values in $N$ and is independent of $I$ .

For $n \in N$ , show that the transition probability matrix $P^{n}$ of $X$ is given by $P^{n} (x, y) = (\binom{n}{y - x}) p^{y - x} (1 - p)^{n - y + x}, x \in N, y \in {x, x + 1, \dots, x + n}$
For $α \in (0, 1]$ , show that the potential matrix $R_{α}$ of $X$ is given by $R_{α} (x, y) = \frac{1}{1 - α + α p} {(\frac{α p}{1 - α + α p})}^{y - x}, x \in N, y \in {x, x + 1, \dots}$
For $α \in (0, 1)$ and $x \in N$ , identify the probability distribution defined by $(1 - α) R_{α} (x, \cdot)$ .
For $x, y \in N$ with $x \leq y$ , interpret $R (x, y)$ , the expected time in $y$ starting in $x$ , in the context of the process $X$ .

Details:

Recall that $X$ is a Markov process since it has stationary, independent increments.

Note that for $n, x \in N$ , $P^{n} (x, \cdot)$ is the (discrete) PDF of $x + \sum_{k = 1}^{n} I_{k}$ . The result follows since the sum of the indicator variables has the binomial distribution with parameters $n$ and $p$ .
Let $α \in (0, 1]$ and let $x, y \in N$ with $x \leq y$ . Then $\begin{aligned} R_{α} (x, y) & = \sum_{n = 0}^{\infty} α^{n} P^{n} (x, y) = \sum_{n = y - x}^{\infty} α^{n} (\binom{n}{y - x}) p^{y - x} (1 - p)^{n - y + x} \\ = (α p)^{y - x} \sum_{n = y - x}^{\infty} (\binom{n}{y - x}) [α (1 - p)]^{n - y + x} = \frac{(α p)^{y - x}}{[1 - α (1 - p)]^{n - x + 1}} \end{aligned}$ Simplifying gives the result.
For $α \in (0, 1)$ , $(1 - α) R_{α} (x, y) = \frac{1 - α}{1 - α + α p} {(\frac{α p}{1 - α + α p})}^{y - x}$ As a function of $y$ for fixed $x$ , this is the PDF of $x + Y_{α}$ where $Y_{α}$ has the geometric distribution on $N$ with parameter $\frac{1 - α}{1 - α + α p}$ .
Note that $R (x, y) = 1 / p$ for $x, y \in N$ with $x \leq y$ . Starting in state $x$ , the process eventually reaches $y$ with probability 1. The process remains in state $y$ for a geometrically distributed time, with parameter $p$ . The mean of this distribution is $1 / p$ .

Continuous Time

With the discrete-time setting as motivation, we now turn the more important continuous-time case where

T = [0, \infty)

Potential Kernels

For $α \in [0, \infty)$ , the $α$ -potential kernel $U_{α}$ of $X$ is defined as follows: $U_{α} (x, A) = \int_{0}^{\infty} e^{- α t} P_{t} (x, A) d t, x \in S, A \in S$

The special case $U = U_{0}$ is simply the potential kerenl of $X$ .
For $x \in S$ and $A \in S$ , $U (x, A)$ is the expected amount of time that $X$ spends in $A$ , starting at $x$ .
The family of kernels $U = {U_{α} : α \in (0, \infty)}$ is known as the reolvent of $X$ .

Details:

Since $P = {P_{t} : t \in T}$ is a Feller semigroup of transition operators, the mapping $(t, x) \mapsto P_{t} (x, A)$ from $[0, \infty) \times S$ to $[0, 1]$ is jointly measurable for $A \in S$ . Thus, $U_{α} (x, A)$ makes sense for $x \in S$ and $A \in S$ and $x \mapsto U_{α} (x, A)$ from $S$ to $[0, \infty)$ is measurable for $A \in S$ . That $A \mapsto U_{α} (x, A)$ is a measure on $S$ follows from the usual interchange of sum and integral, via Fubini's theorem: Suppose that ${A_{j} : j \in J}$ is a countable collection of disjoint sets in $S$ , and let $S = ⋃_{j \in J} A_{j}$ $\begin{aligned} U_{α} (x, A) & = \int_{0}^{\infty} e^{- α t} P_{t} (x, A) d t = \int_{0}^{\infty} [\sum_{j \in J} e^{- α t} P_{t} (x, A_{j})] d t \\ = \sum_{j \in J} \int_{0}^{\infty} e^{- α t} P_{t} (x, A_{j}) d t = \sum_{j \in J} U_{α} (x, A_{j}) \end{aligned}$ Finally, the interpretation of $U (x, A)$ for $x \in S$ and $A \in S$ is another interchange of integrals: $U (x, A) = \int_{0}^{\infty} P_{t} (x, A) d t = \int_{0}^{\infty} E [1 (X_{t} \in A) ∣ X_{0} = x] d t = E (\int_{0}^{\infty} 1 (X_{t} \in A) d t | X_{0} = x)$ The inside integral is the Lebesgue measure of ${t \in [0, \infty) : X_{t} \in A}$ .

As with discrete time, it's quite possible that

U (x, A) = \infty

for some

x \in S

and

A \in S

, and knowing when this is the case is of considerable interest. As with all kernels, the potential kernel

U_{α}

defines two operators, operating on the right on functions, and operating on the left on positive measures. If

f : S \to R

is measurable then, giving the right potential operator in its many forms,

\begin{aligned} U_{α} f (x) & = \int_{S} U_{α} (x, d y) f (y) = \int_{0}^{\infty} e^{- α t} P_{t} f (x) d t \\ = \int_{0}^{\infty} e^{- α t} \int_{S} P_{t} (x, d y) f (y) = \int_{0}^{\infty} e^{- α t} E [f (X_{t}) ∣ X_{0} = x] d t, x \in S \end{aligned}

assuming that the various integrals make sense. This will be the case in particular if

f

is nonnegative, or if

f \in B

and

α > 0

If $α > 0$ , then $U_{α} (x, S) = \frac{1}{α}$ for all $x \in S$ .

Details:

For $x \in S$ , $U_{α} (x, S) = \int_{0}^{\infty} e^{- α t} P_{t} (x, S) d t = \int_{0}^{\infty} e^{- α t} d t = \frac{1}{α}$

It follows that for

α \in (0, \infty)

, the right potential operator

U_{α}

is a bounded, linear operator on

B

with

∥ U_{α} ∥ = \frac{1}{α}

. It also follows that

α U_{α}

is a probability kernel. This kernel has a nice interpretation.

If $α > 0$ then $α U_{α} (x, \cdot)$ is the conditional distribution of $X_{τ}$ where $τ$ is independent of $X$ and has the exponential distribution on $[0, \infty)$ with parameter $α$ .

Details:

Suppose that $x \in S$ and $A \in S$ . The random time $τ$ has PDF $f (t) = α e^{- α t}$ for $t \in [0, \infty)$ . Hence, conditioning on $τ$ gives $P (X_{τ} \in A ∣ X_{0} = x) = \int_{0}^{\infty} α e^{- α t} P (X_{τ} \in A ∣ τ = t, X_{0} = x) d t$ But by the substitution rule and the assumption of independence, $P (X_{τ} \in A ∣ τ = t, X_{0} = x) = P (X_{t} \in A ∣ τ = t, X_{0} = x) = P (X_{t} \in A ∣ X_{0} = x) = P_{t} (x, A)$ Substituting gives $P (X_{τ} \in A ∣ X_{0} = x) = \int_{0}^{\infty} α e^{- α t} P_{t} (x, A) d t = α U_{α} (x, A)$

α U_{α}

is a transition probability kernel, just as

P_{t}

is a transition probability kernel, but corresponding to the random time

τ

(with

α \in (0, \infty)

as a parameter), rather than the deterministic time

t \in [0, \infty)

. As in the discrete case, the potential kernel can also be interpreted in economic terms. Suppose that

A \in S

and that we receive money at a rate of one unit per unit time whenever the process

X

is in

A

. Then

U (x, A)

is the expected total amount of money that we receive, starting in state

x \in S

. But again, money that we receive later is of less value to us now than money that we will receive sooner. Specifically, suppose that one monetary unit at time

t \in [0, \infty)

has a present value of

e^{- α t}

where

α \in (0, \infty)

is the inflation factor or discount factor. The

U_{α} (x, A)

is the total, expected, discounted amount that we receive, starting in

x \in S

. A bit more generally, suppose that

f \in B

and that

f (x)

is the reward (or cost, depending on the sign) per unit time that we receive when the process is in state

x \in S

. Then

U_{α} f (x)

is the expected, total, discounted reward, starting in state

x \in S

For the left potential operator, if

μ

is a positive measure on

S

then

\begin{aligned} μ U_{α} (A) & = \int_{S} μ (d x) U_{α} (x, A) = \int_{0}^{\infty} e^{- α t} μ P_{t} (A) d t \\ = \int_{0}^{\infty} e^{- α t} [\int_{S} μ (d x) P_{t} (x, A)] d t = \int_{0}^{\infty} e^{- α t} [\int_{S} μ (d x) P (X_{t} \in A)] d t, A \in S \end{aligned}

In particular, suppose that

α > 0

and that

μ

is a probability measure and

X_{0}

has distribution

μ

. Then

μ P_{t}

is the distribution of

X_{t}

for

t \in [0, \infty)

, and hence from the last result,

α μ U_{α}

is the distribution of

X_{τ}

, where again,

τ

is independent of

X

and has the exponential distribution on

[0, \infty)

with parameter

α

. The family of potential kernels gives the same information as the family of transition kernels.

The resolvent $U = {U_{α} : α \in (0, \infty)}$ completely determines the family of transition kernels $P = {P_{t} : t \in (0, \infty)}$ .

Details:

Note that for $x \in S$ and $A \in S$ , the function $α \mapsto U_{α} (x, A)$ on $(0, \infty)$ is the Laplace transform of the function $t \mapsto P_{t} (x, A)$ on $[0, \infty)$ . The Laplace transform of a function determines the function completely.

It follows that the resolvent

{U_{α} : α \in [0, \infty)}

, along with the initial distribution, completely determine the finite dimensional distributions of the Markov process

X

. This is much more important here in the continuous-time case than in the discrete-time case, since the transition kernels

P_{t}

cannot be generated from a single transition kernel. The potential kernels commute with each other and with the transition kernels.

Suppose that $α, β, t \in [0, \infty)$ . Then (as kernels),

$P_{t} U_{α} = U_{α} P_{t} = \int_{0}^{\infty} e^{- α s} P_{s + t} d s$
$U_{α} U_{β} = \int_{0}^{\infty} \int_{0}^{\infty} e^{- α s} e^{- β t} P_{s + t} d s d t$

Details:

Suppose that $f \in B$ is nonnegative. The interchanges of operators and integrals below are interchanges of integrals, and are justified since the integrands are nonnegative. The other tool used is the semigroup property of $P = {P_{t} : t \in [0, \infty)}$ .

Directly, $U_{α} P_{t} f = \int_{0}^{\infty} e^{- α s} P_{s} P_{t} f d s = \int_{0}^{\infty} e^{- α s} P_{s + t} f d s$ The other direction involves an interchange. $P_{t} U_{α} f = P_{t} \int_{0}^{\infty} e^{- α s} P_{s} f d s = \int_{0}^{\infty} e^{- α s} P_{t} P_{s} f d s = \int_{0}^{\infty} e^{- α s} P_{s + t} f d s$
First $\begin{aligned} U_{α} U_{β} f & = \int_{0}^{\infty} e^{- α s} P_{s} U_{β} f d s = \int_{0}^{\infty} e^{- α s} P_{s} \int_{0}^{\infty} e^{- β t} P_{t} f d t \\ = \int_{0}^{\infty} e^{- α s} \int_{0}^{\infty} e^{- β t} P_{s} P_{t} f d s d t = \int_{0}^{\infty} \int_{0}^{\infty} e^{- α s} e^{- β t} P_{s + t} f d s d t \end{aligned}$ The other direction is similar.

The same identities hold for the right operators on the entire space

B

under the additional restriction that

α > 0

and

β > 0

. The fundamental equation that relates the potential kernels, known as the resolvent equation, is given in the next theorem:

If $α, β \in [0, \infty)$ with $α \leq β$ then (as kernels) $U_{α} = U_{β} + (β - α) U_{α} U_{β}$ .

Details:

If $α = β$ the equation is trivial, so assume $α < β$ . Suppose that $f \in B$ is nonnegative. From the previous result, $U_{α} U_{β} f = \int_{0}^{\infty} \int_{0}^{\infty} e^{- α s} e^{- β t} P_{s + t} f d t d s$ The transformation $u = s + t, v = s$ maps $[0, \infty)^{2}$ one-to-one onto ${(u, v) \in [0, \infty)^{2} : u \geq v}$ . The inverse transformation is $s = v, t = u - v$ with Jacobian $- 1$ . Hence we have $\begin{aligned} U_{α} U_{β} f & = \int_{0}^{\infty} \int_{0}^{u} e^{- α v} e^{- β (u - v)} P_{u} f d v d u = \int_{0}^{\infty} (\int_{0}^{u} e^{(β - α) v} d v) e^{- β u} P_{u} f d u \\ = \frac{1}{β - α} \int_{0}^{\infty} [e^{(β - α) u} - 1] e^{- β u} P_{u} f d u \\ = \frac{1}{β - α} (\int_{0}^{\infty} e^{- α u} P_{u} f d u - \int_{0}^{\infty} e^{- β u} P_{u} f d u) = \frac{1}{β - α} (U_{α} f - U_{β} f) \end{aligned}$ Simplifying gives the result. Note that $U_{β} f$ is finite since $β > 0$ .

The same identity holds for the right potential operators on the entire space

B

, under the additional restriction that

α > 0

. For

α \in (0, \infty)

U_{α}

is also an operator on the space

C_{0}

If $α \in (0, \infty)$ and $f \in C_{0}$ then $U_{α} f \in C_{0}$ .

Details:

Suppose that $f \in C_{0}$ and that $(x_{1}, x_{2}, \dots)$ is a sequence in $S$ . Then $P_{t} f \in C_{0}$ for $t \in [0, \infty)$ . Hence if $x_{n} \to x \in S$ as $n \to \infty$ then $e^{- α t} P_{t} f (x_{n}) \to e^{- α t} P_{t} f (x)$ as $n \to \infty$ for each $t \in [0, \infty)$ . By the dominated convergence theorem, $U_{α} f (x_{n}) = \int_{0}^{\infty} e^{- α t} P_{t} f (x_{n}) d t \to \int_{0}^{\infty} e^{- α t} P_{t} f (x) d t = U_{α} f (x) as n \to \infty$ Hence $U_{α} f$ is continuous. Next suppose that $x_{n} \to \infty$ as $n \to \infty$ . This means that for every compact $C \subseteq S$ , there exist $m \in N_{+}$ such that $x_{n} \notin C$ for $n > m$ . Them $e^{- α t} P_{t} f (x_{n}) \to 0$ as $n \to \infty$ for each $t \in [0, \infty)$ . Again by the dominated convergence theorem, $U_{α} f (x_{n}) = \int_{0}^{\infty} e^{- α t} P_{t} f (x_{n}) d t \to 0 as n \to \infty$ So $U_{α} f \in C_{0}$ .

If $f \in C_{0}$ then $α U_{α} f \to f$ as $α \to \infty$ .

Details:

Convergence is with respect to the supremum norm on $C_{0}$ , of course. Suppose that $f \in C_{0}$ . Note first that with a change of variables $s = α t$ , $α U_{α} f = \int_{0}^{\infty} α e^{- α t} P_{t} f d t = \int_{0}^{\infty} e^{- s} P_{s / α} f d s$ and hence $| α U_{α} f - f | = | \int_{0}^{\infty} e^{- s} (P_{s / α} f - f) d s | \leq \int_{0}^{\infty} e^{- s} | P_{s / α} f - f | d s \leq \int_{0}^{\infty} e^{- s} ∥ P_{s / α} f - f ∥ d s$ So it follows that $∥ α U_{α} f - f ∥ \leq \int_{0}^{\infty} e^{- s} ∥ P_{s / α} f - f ∥ d s$ But $∥ P_{s / α} f - f ∥ \to 0$ as $α \to \infty$ and hence by the dominated convergence theorem, $\int_{0}^{\infty} e^{- s} ∥ P_{s / α} f - f ∥ d s \to 0$ as $α \to \infty$ .

Infinitesimal Generator

In continuous time, it's not at all clear how we could construct a Markov process with desired properties, say to model a real system of some sort. Stated mathematically, the existential problem is how to construct the family of transition kernels

{P_{t} : t \in [0, \infty)}

so that the semigroup property

P_{s} P_{t} = P_{s + t}

is satisfied for all

s, t \in [0, \infty)

. The answer, as for similar problems in the deterministic world, comes essentially from calculus, from a type of derivative.

The infinitesimal generator of the Markov process $X$ is the operator $G : D \to C_{0}$ defined by $G f = lim_{t ↓ 0} \frac{P_{t} f - f}{t}$ on the domain $D \subseteq C_{0}$ for which the limit exists.

As usual, the limit is with respect to the supremum norm on

C_{0}

, so

f \in D

and

G f = g

means that

f, g \in C_{0}

and

∥ \frac{P_{t} f - f}{t} - g ∥ = sup {| \frac{P_{t} f (x) - f (x)}{t} - g (x) | : x \in S} \to 0 as t ↓ 0

So in particular,

G f (x) = lim_{t ↓ 0} \frac{P_{t} f (x) - f (x)}{t} = lim_{t ↓ 0} \frac{E [f (X_{t}) ∣ X_{0} = x] - f (x)}{t}, x \in S

The domain $D$ is a subspace of $C_{0}$ and the generator $G$ is a linear operator on $D$

If $f \in D$ and $c \in R$ then $c f \in D$ and $G (c f) = c G f$ .
If $f, g \in D$ then $f + g \in D$ and $G (f + g) = G f + G g$ .

Details:

These are simple results that depend on the linearity of $P_{t}$ for $t \in [0, \infty)$ and basic results on convergence.

If $f \in D$ then $\frac{P_{t} (c f) - (c f)}{t} = c \frac{P_{t} f - f}{t} \to c G f as t ↓ 0$
If $f, g \in D$ then $\frac{P_{t} (f + g) - (f + g)}{t} = \frac{P_{t} f - f}{t} + \frac{P_{t} g - g}{t} \to G f + G g as t ↓ 0$

Note

G

is the (right) derivative at 0 of the function

t \mapsto P_{t} f

. Because of the semigroup property, this differentiability property at

0

implies differentiability at arbitrary

t \in [0, \infty)

. Moreover, the infinitesimal operator and the transition operators commute:

If $f \in D$ and $t \in [0, \infty)$ , then $P_{t} f \in D$ and the following derivative rules hold with respect to the supremum norm.

$P_{t}^{'} f = P_{t} G f$ , the Kolmogorov forward equation
$P_{t}^{'} f = G P_{t} f$ , the Kolmogorov backward equation

Details:

Let $f \in D$ . All limits and statements about derivatives and continuity are with respect to the supremum norm.

By assumption, $\frac{1}{h} (P_{h} f - f) \to G f as h ↓ 0$ Since $P_{t}$ is a bounded, linear operator on the space $C_{0}$ , it preserves limits, so $\frac{1}{h} (P_{t} P_{h} f - P_{t} f) = \frac{1}{h} (P_{t + h} f - P_{t} f) \to P_{t} G f as h ↓ 0$ This proves the result for the derivative from the right. But since $t \mapsto P_{t} f$ is continuous, the the result is also true for the two-sided derivative.
From part (a), we now know that $\frac{1}{h} (P_{h} P_{t} f - P_{t} f) = \frac{1}{h} (P_{t + h} f - P_{t} f) \to P_{t} G f as h \to 0$ By definition, this means that $P_{t} f \in D$ and $G P_{t} f = P_{t} G f = P_{t}^{'} f$ .

Exercise [24] gives a possible solution to the dilema that motivated this discussion in the first place. If we want to construct a Markov process with desired properties, to model a a real system for example, we can start by constructing an appropriate generator

G

and then solve the initial value problem

P_{t}^{'} = G P_{t}, P_{0} = I

to obtain the transition operators

P = {P_{t} : t \in [0, \infty)}

. The next theorem gives the relationship between the potential operators and the infinitesimal operator, which in some ways is better. This relationship is analogous to the relationship between the potential operators and the one-step operator in for discrete time

Suppose $α \in (0, \infty)$ .

If $f \in D$ the $G f \in C_{0}$ and $f + U_{α} G f = α U_{α} f$
If $f \in C_{0}$ then $U_{α} f \in D$ and $f + G U_{α} f = α U_{α} f$ .

Details:

By definition, if $f \in D$ then $G f \in C_{0}$ . Hence using the previous result, $f + U_{α} G f = f + \int_{0}^{\infty} e^{- α t} G P_{t} f d t = f + \int_{0}^{\infty} e^{- α t} P_{t}^{'} f d t$ Integrating by parts (with $u = e^{- α t}$ and $d v = P_{t}^{'} f d t$ ) gives $f + G U_{α} f = f - e^{- α t} P_{t} f |_{0}^{\infty} + α \int_{0}^{\infty} e^{- α t} P_{t} f d t$ But $e^{- α t} P_{t} f \to 0$ as $t \to \infty$ while $P_{0} f = f$ . The last term is $α U_{α} f$ .
Suppose that $f \in C_{0}$ . From the result above and the substitution $u = s + t$ , $P_{t} U_{α} f = \int_{0}^{\infty} e^{- α s} P_{s + t} f d s = \int_{t}^{\infty} e^{- α (u - t)} P_{u} f d u = e^{α t} \int_{t}^{\infty} e^{- α u} P_{u} f d u$ Hence $\frac{P_{t} U_{α} f - U_{α} f}{t} = \frac{1}{t} [e^{α t} \int_{t}^{\infty} e^{- α u} P_{u} f d u - U_{α} f]$ Adding and subtracting $e^{α u} U_{α} f$ and combining integrals gives $\begin{aligned} \frac{P_{t} U_{α} f - U_{α} f}{t} & = \frac{1}{t} [e^{α t} \int_{t}^{\infty} e^{- α u} P_{u} f d u - e^{α t} \int_{0}^{\infty} e^{- α u} P_{u} f d u] + \frac{e^{α t} - 1}{t} U_{α} f \\ = - e^{α t} \frac{1}{t} \int_{0}^{t} e^{- α s} P_{s} f d s + \frac{e^{α t} - 1}{t} U_{α} f \end{aligned}$ Since $s \mapsto P_{s} f$ is continuous, the first term converges to $- f$ as $t ↓ 0$ . The second term converges to $α U_{α} f$ as $t ↓ 0$ .

Suppose again that $α \in (0, \infty)$ .

$U_{α} = (α I - G)^{- 1} : C_{0} \to D$
$G = α I - U_{α}^{- 1} : D \to C_{0}$

Details:

Recall that $U_{α} : C_{0} \to D$ and $G : D \to C_{0}$

By part(a) the previous result we have $α U_{α} - U_{α} G = I$ so $U_{α} (α I - G) = I$ . By part (b) we have $α U_{α} - G U_{α} = I$ so $(α I - G) U_{α} = I$ .
This follows from (a).

So, from the generator

G

we can determine the potential operators

U = {U_{α} : α \in (0, \infty)}

, which in turn determine the transition operators

P = {P_{t} : t \in (0, \infty)}

. In continuous time, transition operators

P = {P_{t} : t \in [0, \infty)}

can be obtained from the single, infinitesimal operator

G

in a way that is reminiscent of the fact that in discrete time, the transition operators

P = {P^{n} : n \in N}

can be obtained from the single, one-step operator

P

Examples and Applications

Consider the Markov process $X = {X_{t} : t \in [0, \infty)}$ on $R$ satisfying the ordinary differential equation $\frac{d}{d t} X_{t} = g (X_{t}), t \in [0, \infty)$ where $g : R \to R$ is Lipschitz continuous. The infinitesimal operator $G$ is given by $G f (x) = f^{'} (x) g (x)$ for $x \in R$ on the domain $D$ of functions $f : R \to R$ where $f \in C_{0}$ and $f^{'} \in C_{0}$ .

Details:

Recall that the only source of randomness in this process is the initial sate $X_{0}$ . By the continuity assumptions on $g$ , there exists a unique solution $X_{t} (x)$ to the differential equation with initial value $X_{0} = x$ , defined for all $t \in [0, \infty)$ . The transition operator $P_{t}$ for $t \in [0, \infty)$ is defined on $B$ by $P_{t} f (x) = f [X_{t} (x)]$ for $x \in R$ . By the ordinary chain rule, if $f$ is differentiable, $\frac{P_{t} f (x) - f (x)}{t} = \frac{f [X_{t} (x)] - f (x)}{t} \to f^{'} (x) g (x) as t ↓ 0$

Our next example considers the Poisson process as a Markov process. Compare this with the binomial process in [13].

Let $N = {N_{t} : t \in [0, \infty)}$ denote the Poisson process on $N$ with rate $β \in (0, \infty)$ . Define the Markov process $X = {X_{t} : t \in [0, \infty)}$ by $X_{t} = X_{0} + N_{t}$ where $X_{0}$ takes values in $N$ and is independent of $N$ .

For $t \in [0, \infty)$ , show that the probability transition matrix $P_{t}$ of $X$ is given by $P_{t} (x, y) = e^{- β t} \frac{(β t)^{y - x}}{(y - x)!}, x, y \in N, y \geq x$
For $α \in [0, \infty)$ , show that the potential matrix $U_{α}$ of $X$ is given by $U_{α} (x, y) = \frac{1}{α + β} {(\frac{β}{α + β})}^{y - x}, x, y \in N, y \geq x$
For $α > 0$ and $x \in N$ , identify the probability distribution defined by $α U_{α} (x, \cdot)$ .
Show that the infinitesimal matrix $G$ of $X$ is given by $G (x, x) = - β$ , $G (x, x + 1) = β$ for $x \in N$ .

Details:

Note that for $t \in [0, \infty)$ and $x \in N$ , $P_{t} (x, \cdot)$ is the (discrete) PDF of $x + N_{t}$ since $N_{t}$ has the Poisson distribution with parameter $β t$ .
Let $α \in [0, \infty)$ and let $x, y \in N$ with $x \leq y$ . Then $\begin{aligned} U_{α} (x, y) & = \int_{0}^{\infty} e^{- α t} P_{t} (x, y) d t = \int_{0}^{\infty} e^{- α t} e^{- β t} \frac{(β t)^{y - x}}{(y - x)!} d t \\ = \frac{β^{y - x}}{(y - x)!} \int_{0}^{\infty} e^{- (α + β) t} t^{y - x} d t \end{aligned}$ The change of variables $s = (α + β) t$ gives $U_{α} (x, y) = \frac{β^{y - x}}{(y - x)! (α + β)^{y - x + 1}} \int_{0}^{\infty} e^{- s} s^{y - x} d s$ But the last integral is $Γ (y - x + 1) = (y - x)!$ . Simplifying gives the result.
For $α > 0$ , $α U_{α} (x, y) = \frac{α}{α + β} {(\frac{β}{α + β})}^{y - x}, x, y \in N, y \geq x$ As a function of $y$ for fixed $x$ , this is the PDF of $x + Y_{α}$ where $Y_{α}$ has the geometric distribution with parameter $\frac{α}{α + β}$ .
Note that for $x, y \in N$ , $G (x, y) = \frac{d}{d t} P_{t} (x, y) |_{t = 0}$ . By simple calculus, this is $- β$ if $y = x$ , $β$ if $y = x + 1$ , and 0 otherwise.