Stable Distributions

This section discusses a theoretical topic that you may want to skip if you are a new student of probability.

Basic Theory

Stable distributions are an important general class of probability distributions on \( \R \) that are defined in terms of location-scale transformations. Stable distributions occur as limits (in distribution) of scaled and centered sums of independent, identically distributed variables. Such limits generalize the central limit theorem, and so stable distributions generalize the normal distribution in a sense. The pioneering work on stable distributions was done by Paul Lévy.

Definition

In this section, we consider real-valued random variables whose distributions are not degenerate (that is, not concentrated at a single value). After all, a random variable with a degenerate distribution is not really random, and so is not of much interest.

Random variable \( X \) has a stable distribution if the following condition holds: If \( n \in \N_+ \) and \( (X_1, X_2, \ldots, X_n) \) is a sequence of independent copies of \( X \), then \( X_1 + X_2 + \cdots + X_n \) has the same distribution as \( a_n + b_n X \) for some \( a_n \in \R \) and \( b_n \in (0, \infty) \). If \( a_n = 0 \) for \( n \in \N_+ \) then the distribution of \( X \) is strictly stable.

The parameters \( a_n \) for \( n \in \N_+ \) are the centering parameters.
The parameters \( b_n \) for \( n \in \N_+ \) are the norming parameters.

Details:

Since the distribution of \( X \) is not point mass at 0, note that if the distribution of \( a + b X \) is the same as the distribution of \( c + d X \) for some \( a, \, c \in \R \) and \( b, \, d \in (0, \infty) \), then \( a = c \) and \( b = d \). Thus, the centering parameters \( a_n \) and the norming parameters \( b_n \) are uniquely defined for \( n \in \N_+ \).

Recall that two distributions on \( \R \) that are related by a location-scale transformation are said to be of the same type, and that being of the same type defines an equivalence relation on the class of distributions on \( \R \). With this terminology, the definition of stability has a more elegant expression: \( X \) has a stable distribution if the sum of a finite number of independent copies of \( X \) is of the same type as \( X \). As we will see, the norming parameters are more important than the centering parameters, and in fact, only certain norming parameters can occur.

Basic Properties

We start with some very simple results that follow easily from the definition, before moving on to the deeper results.

Suppose that \( X \) has a stable distribution with mean \( \mu \) and finite variance. Then the norming parameters are \( \sqrt{n} \) and the centering parameters are \( \left(n - \sqrt{n}\right) \mu \) for \( n \in \N_+ \).

Details:

As usual, let \( a_n \) and \( b_n \) denote the centering and norming parameters of \( X \) for \( n \in \N_+ \), and let \( \sigma^2 \) denote the (finite) variance of \( X \). Suppose that \( n \in \N_+ \) and that \( (X_1, X_2, \ldots, X_n) \) is a sequence of independent copies of \( X \). Then \( X_1 + X_2 + \cdots + X_n \) has the same distribution as \( a_n + b_n X \). Taking variances gives \( n \sigma^2 = b_n^2 \sigma^2 \) and hence \( b_n = \sqrt{n} \). Taking expected values now gives \( n \mu = a_n + \sqrt{n} \mu \).

It will turn out that the only stable distribution with finite variance is the normal distribution, but the result above is useful as an intermediate step. Next, it seems fairly clear from the definition that the family of stable distributions is itself a location-scale family.

Suppose that the distribution of \( X \) is stable, with centering parameters \( a_n \in \R \) and norming parameters \( b_n \in (0, \infty) \)for \( n \in \N_+ \). If \( c \in \R \) and \( d \in (0, \infty) \), then the distribution of \( Y = c + d X \) is also stable, with centering parameters \( d a_n + (n - b_n) c \) and norming parameters \( b_n \) for \( n \in \N_+ \).

Details:

Suppose that \( n \in \N_+ \) and that \( (Y_1, Y_2, \ldots, Y_n) \) is a sequence of independent copies of \( Y \). Then \( Y_1 + Y_2 + \cdots + Y_n \) has the same distribution \( n c + d(X_1 + X_2 + \cdots + X_n) \) where \( (X_1, X_2, \ldots) \) is a sequence of independent copies of \( X \). By stability, \( X_1 + X_2 + \cdots + X_n \) has the same distribution as \( a_n + b_n X \). Hence \( Y_1 + Y_2 + \cdots + Y_n \) has the same distribution as \( (n c + d a_n) + d b_n X \), which in turn has the same distribution as \( [d a_n + (n - b_n) c] + b_n Y \).

An important point is the the norming parameters are unchanged under a location-scale transformation.

Suppose that the distribution of \( X \) is stable, with centering parameters \( a_n \in \R\) and norming parameters \( b_n \in (0, \infty)\) for \( n \in \N_+ \). Then the distribution of \( -X \) is stable, with centering parameters \( -a_n \) and norming parameters \( b_n \) for \( n \in \N_+ \).

Details:

If \( n \in \N_+ \) and \( (X_1, X_2, \ldots, X_n) \) is a sequence of independent copies of \( X \) then \( (-X_1, -X_2, \ldots, -X_n) \) is a sequence of independent copies of \( -X \). By stability, \( -\sum_{i=1}^n X_i \) has the same distribution as \( -(a_n + b_n X) = - a_n + b_n (-X) \).

From and , if \( X \) has a stable distribution, then so does \( c + d X \), with the same norming parameters, for every \( c, \, d \in \R \) with \( d \neq 0 \). Stable distributions are also closed under convolution (corresponding to sums of independent variables) if the norming parameters are the same.

Suppose that \( X \) and \( Y \) are independent variables. Assume also that \( X \) has a stable distribution with centering parameters \( a_n \in \R \) and norming parameters \( b_n \in (0, \infty)\) for \( n \in \N_+ \), and that \( Y \) has a stable distribution with centering parameters \( c_n \in \R \) and the same norming parameters \( b_n \) for \( n \in \N_+ \). Then \( Z = X + Y \) has a stable distribution with centering paraemters \( a_n + c_n \) and norming parameters \( b_n \) for \( n \in \N_+ \).

Details:

Suppose that \( n \in \N_+ \) and that \( (Z_1, X_2, \ldots, Z_n) \) is a sequence of independent copies of \( Z \). Then \( \sum_{i=1}^n Z_i \) has the same distribution as \( \sum_{i=1}^n X_i + \sum_{i=1}^n Y_i \) where \( \bs{X} = (X_1, X_2, \ldots, X_n) \) is a sequence of independent copies of \( X \), and \( \bs{Y} = (Y_1, Y_2, \ldots, Y_n) \) is a sequence of independent copies of \( Y \), and where \( \bs{X} \) and \( \bs{Y} \) are independent. By stability, this is the same as the distribution of \( (a_n + b_n X) + (c_n + b_n Y) = (a_n + c_n) + b_n (X + Y) \).

Random variable \( X \) has a stable distribution if and only if the following condition holds: If \( X_1, \, X_2 \) are independent copies of \( X \) and \( d_1, d_2 \in (0, \infty) \) then \( d_1 X_1 + d_2 X_2 \) has the same distribution as \( a + b X \) for some \( a \in \R \) and \( b \in (0, \infty) \).

Details:

Clearly the condition in definition implies the condition here. Coversely, suppose that the condition here holds. We will show by induction that the condition in definition holds. For \( n = 2 \), definition is a special case of the condition in this theorem, with \( d_1 = d_2 = 1 \). Suppose that condition holds for a given \( n \in \N_+ \). Suppose that \( (X_1, X_2, \ldots, X_n, X_{n + 1}) \) is a sequence of independent copies of \( X \). By the induction hypothesis, \( Y_n = X_1 + X_2 + \cdots + X_n \) has the same distribution as \( a_n + b_n X \) for some \( a_n \in \R \) and \( b_n \in (0, \infty) \). By independence, \( Y_{n + 1} = X_1 + X_2 + \cdots + X_n + X_{n+1} \) has the same distribution as \( a_n + b_n X_1 + X_{n + 1} \). By another application of the condition above, \( b_n X_1 + X_{n+1} \) has the same distribution as \( c + b_{n + 1} X \) for some \( c \in \R \) and \( b_{n + 1} \in (0, \infty) \). But then \( Y_{n+1} \) has the same distribution as \( (a_n + c) + b_{n + 1} X \).

Suppose that \( X \) and \( Y \) are independent with the same stable distribution. Then the distribution of \( X - Y \) is stable, with the same norming parameters.

Note that the distribution of \( X - Y \) is symmetric (about 0). The last result is useful because it allows us to get rid of the centering parameters when proving facts about the norming parameters. Here is the most important of those facts:

Suppose that \( X \) has a stable distribution. Then the norming parameters have the form \( b_n = n^{1/\alpha} \) for \( n \in \N_+ \), for some \( \alpha \in (0, 2] \). The parameter \( \alpha \) is known as the index or characteristic exponent of the distribution.

Details:

The proof is in several steps, and is based on the proof in An Introduction to Probability Theory and Its Applications, Volume II, by William Feller. The proof uses the basic trick of writing a sum of independent copies of \( X \) in different ways in order to obtain relationships between the norming constants \( b_n \).

First we can assume from that the distribution of \( X \) is symmetric and strictly stable. Let \( (X_1, X_2, \ldots) \) be a sequence of independent copies of \( X \). Let \( Y_n = \sum_{i=1}^n X_i \) for \( n \in \N_+ \). Now let \( n, \, m \in \N_+ \) and consider \( Y_{m n} \). Directly from stability, \( Y_{m n} \) has the same distribution as \( b_{m n} X \). On the other hand, \( Y_{m n} \) can be thought of as a sum of \( m \) blocks, where each block is a sum of \( n \) independent copies of \( X \). Each block has the same distribution as \( b_n X \), and since the blocks are independent, it follows that \( Y_{m n} \) has the same distribution as \[ b_n X_1 + b_n X_2 + \cdots + b_n X_m = b_n (X_1 + X_2 + \cdots + X_m) \] But by another application of stability, the random variable on the right has the same distribution as \( b_n b_m X \). It then follows that \( b_{m n} = b_m b_n \) for all \( m, \, n \in \N_+ \) which in turn leads to \( b_{n^k} = b_n^k \) for all \( n, \, k \in \N_+ \).

We use the same trick again, this time with a sum. Let \( m, \, n \in \N_+ \) and consider \( Y_{m+n} \). Directly from stability, \( Y_{m + n} \) has the same distribution as \( b_{m+n} X \). On the other hand, \( Y_{m+n} \) can be thought of as the sum of two blocks. The first is the sum of \( m \) independent copies of \( X \) and hence has the same distribution as \( b_m X \), while the second is the sum of \( n \) independent copies of \( X \) and hence has the same distribution as \( b_n X \). Since the blocks are independent, it follows that \( b_{m+n} X \) has the same distribution as \( b_m X_1 + b_n X_2 \), or equivalently, \( X \) has the same distribution as \[ U = \frac{b_m}{b_{m+n}} X_1 + \frac{b_n}{b_{m+n}} X_2 \] Next note that for \( x \gt 0 \), \[ \left\{X_1 \ge 0, X_2 \gt \frac{b_{m+n}}{b_n} x\right\} \subseteq \{U \gt x\} \] and so by independence, \[ \P(U \gt x) \ge \P\left(X_1 \ge 0, X_2 \gt \frac{b_{m + n}}{b_n} x \right) = \P(X_1 \ge 0) \P\left(X_2 \gt \frac{b_{m+n}}{b_n} x \right) \] But by symmetry, \( \P(X_1 \ge 0) \ge \frac{1}{2} \). Also \( X_2 \) and \( U \) have the same distribution as \( X \), so we conclude that \[ \P(X \gt x) \ge \frac{1}{2} \P\left(X \gt \frac{b_{m+n}}{b_n} x\right), \quad x \gt 0\] It follows that the ratios \( b_n \big/ b_{m+n} \) are bounded for \( m, \, n \in \N_+ \). If that were not the case, we could find a sequence of integers \( m, \, n \) with \( b_{m+n} \big/ b_n \to 0 \), in which case the displayed equation above would give the contradiction \( \P(X \gt x) \ge \frac{1}{4} \) for all \( x \gt 0 \). Restating, the ratios \( b_k / b_n \) are bounded for \( k, \, n \in \N_+ \) with \( k \lt n \).

Fix \( r \in \N_+ \). There exists a unique \( \alpha \in (0, \infty) \) with \( b_r = r^{1/\alpha} \). It then follows from step 1 above that \( b_n = n^{1/\alpha} \) for every \( n = r^j \) with \( j \in \N_+ \). Similarly, if \( s \in \N_+ \), there exists \( \beta \in (0, \infty) \) with \( b_s = s^{1/\beta} \) and then \( b_m = m^{1/\beta} \) for every \( m = s^k \) with \( k \in \N_+ \). For our next step, we show that \( \alpha = \beta \) and it then follows that \( b_n = n^{1/\alpha} \) for every \( n \in \N_+ \). Towards that end, note that if \( m = s^k \) with \( k \in \N_+ \) there exists \( n = r^j \) with \( j \in \N_+ \) with \(n \le m \le r n\). Hence \[ b_m = m^{1/\beta} \le (r n)^{1/\beta} = r^{1/\beta} b_n^{\alpha / \beta} \] Therefore \[ \frac{b_m}{b_n} \le r^{1/\beta} b_n^{\alpha/ \beta - 1} \] Since the coefficients \( b_n \) are unbounded in \( n \in \N_+ \), but the ratios \( b_n / b_m \) are bounded for \( m, \, n \in \N_+ \) with \( m \gt n \), the last inequality implies that \( \beta \le \alpha \). Reversing the roles of \( m \) and \( n \) then gives \( \alpha \le \beta \) and hence \( \alpha = \beta \).

All that remains to show is that \( \alpha \le 2 \). We will do this by showing that if \( \alpha \gt 2 \), then \( X \) must have finite variance, in which case the finite variance property in leads to the contradiction \( \alpha = 2 \). Since \( X^2 \) is nonnegative, \[ \E\left(X^2\right) = \int_0^\infty \P\left(X^2 \gt x\right) dx = \int_0^\infty \P\left(\left|X\right| \gt \sqrt{x}\right) dx = \sum_{k=1}^\infty \int_{2^{k-1}}^{2^k} \P\left(\left|X\right| \gt \sqrt{x}\right) dx \] So the idea is to find bounds on the integrals on the right so that the sum converges. Towards that end, note that for \( t \gt 0 \) and \( n \in \N_+ \) \[ \P(\left|Y_n\right| \gt t b_n) = \P(b_n \left|X\right| \gt t b_n) = \P(\left|X\right| \gt t) \] Hence we can choose \( t \) so that \( \P(\left|Y_n\right| \gt t b_n) \le \frac{1}{4} \). On the other hand, using a special inequality for symmetric distributions, \[ \frac{1}{2}\left(1 - \exp\left[-n \P\left(\left|X\right| \gt t b_n\right)\right]\right) \le \P(\left|Y_n\right| \gt t b_n)\] This implies that \( n \P\left(\left|X\right| \gt t b_n\right) \) is bounded in \( n \) or otherwise the two inequalities together would lead to \( \frac{1}{2} \le \frac{1}{4} \). Substituting \( x = t b_n = t n^{1/\alpha} \) leads to \( \P(\left|X\right| \gt x) \le M x^{-\alpha} \) for some \( M \gt 0 \). It then follows that \[ \int_{2^{k-1}}^{2^k} \P\left(\left|X\right| \gt \sqrt{x}\right) dx \le M 2^{k(1 - \alpha / 2)} \] If \( \alpha \gt 2 \), the series with the terms on the right converges and we have \( \E(X^2) \lt \infty \).

Every stable distribution is continuous.

Details:

As in the proof of , suppose that \( X \) has a symmetric stable distribution with norming parameters \( b_n \) for \( n \in \N_+ \). As a special case of the last proof, for \( n \in \N_+ \), \( X \) has the same distribution as \[ \frac{1}{b_{n + 1}} X_1 + \frac{b_n}{b_{n + 1}} X_2 \] where \( X_1 \) and \( X_2 \) are independent and also have this distribution. Suppose now that \(\P(X = x) = p \) for some \( x \ne 0 \) where \( p \gt 0 \). Then \[ \P\left(X = \frac{1 + b_n}{b_{1 + n}} x\right) \ge \P(X_1 = x) \P(X_2 = x) = p^2 \gt 0 \] If the index \( \alpha \ne 1 \), the points \[ \frac{(1 + b_n)}{b_{1 + n}} x = \frac{1 + n^{1/\alpha}}{(1 + n)^{1/\alpha}} x, \quad n \in \N_+ \] are distinct, which gives us infinitely many atoms, each with probability at least \( p^2 \)—clearly a contradiction.

Next, suppose that the only atom is \( x = 0 \) and that \( \P(X = 0) = p \) where \( p \in (0, 1) \). Then \( X_1 + X_2 \) has the same distribution as \( b_2 X \). But \( P(X_1 + X_2 = 0) = \P(X_1 = 0) \P(X_2 = 0) = p^2 \) while \( \P(b_2 X = 0) = \P(X = 0) = p \), another contradiction.

The next result is a precise statement of the limit theorem alluded to in the introductory paragraph.

Suppose that \( (X_1, X_2, \ldots) \) is a sequence of independent, identically distributed random variables, and let \( Y_n = \sum_{i=1}^n X_i \) for \( n \in \N_+ \). If there exist constants \( a_n \in \R \) and \( b_n \in (0, \infty) \) for \( n \in \N_+ \) such that \( (Y_n - a_n) \big/ b_n \) has a (non-degenerate) limiting distribution as \( n \to \infty \), then the limiting distribution is stable.

The following theorem completely characterizes stable distributions in terms of the characteristic function.

Suppose that \( X \) has a stable distribution. The characteristic function of \( X \) has the following form, for some \( \alpha \in (0, 2] \), \( \beta \in [-1, 1] \), \( c \in \R \), and \( d \in (0, \infty) \) \[ \chi(t) = \E\left(e^{i t X}\right) = \exp\left(i t c - d^\alpha \left|t\right|^\alpha \left[1 + i \beta \sgn(t) u_\alpha(t)\right]\right), \quad t \in \R \] where \( \sgn \) is the usual sign function, and where \[ u_\alpha(t) = \begin{cases} \tan\left(\frac{\pi \alpha}{2}\right), & \alpha \ne 1 \\ \frac{2}{\pi} \ln(|t|), & \alpha = 1 \end{cases} \]

The parameter \( \alpha \) is the index, as before.
The parameter \( \beta \) is the skewness parameter.
The parameter \( c \) is the location parameter.
The parameter \( d \) is the scale parameter.

Thus, the family of stable distributions is a 4 parameter family. The index parameter \( \alpha \) and and the skewness parameter \( \beta \) can be considered shape parameters. When the location parameter \( c = 0 \) and the scale parameter \( d = 1 \), we get the standard form of the stable distributions, with characteristic function \[ \chi(t) = \E\left(e^{i t X}\right) = \exp\left(-\left|t\right|^\alpha \left[1 + i \beta \sgn(t) u_\alpha(t)\right]\right), \quad t \in \R \]

The characteristic function gives another proof that stable distributions are closed under convolution (corresponding to sums of independent variables), if the index is fixed.

Suppose that \( X_1 \) and \( X_2 \) are independent random variables, and that \( X_1 \) and \( X_2 \) have the stable distribution with common index \( \alpha \in (0, 2] \), skewness parameter \( \beta_k \in [-1, 1] \), location parameter \( c_k \in \R \), and scale parameter \( d_k \in (0, \infty)\). Then \( X_1 + X_2 \) has the stable distribution with index \( \alpha \), location parameter \( c = c_1 + c_2 \) , scale parameter \( d = \left(d_1^\alpha + d_2^\alpha\right)^{1/\alpha} \), and skewness parameter \[ \beta = \frac{\beta_1 d_1^\alpha + \beta_2 d_2^\alpha}{d_1^\alpha + d_2^\alpha} \]

Details:

Let \( \chi_k \) denote the characteristic function of \( X_k \) for \( k \in \{1, 2\} \). Then \( X_1 + X_2 \) has characteristic function \( \chi = \chi_1 \chi_2 \). The result follows from using the form of the characteristic function in and some algebra.

Special Cases

Three special parametric families of distributions studied in this chapter are stable. In the proofs in this subsection, we use the definition of stability and various important properties of the distributions. These properties, in turn, are verified in the sections devoted to the distributions. We also give proofs based on the characteristic function, which allows us to identify the skewness parameter.

The normal distribution is stable with index \( \alpha = 2 \). There is no skewness parameter.

Details:

Suppose that \(Z\) has the standard normal distribution. If \( n \in \N_+ \) and \( (Z_1, Z_2, \ldots, Z_n) \) is a sequence of independent copies of \(A\), then \( Z_1 + Z_2 + \cdots + Z_n \) has the normal distribution with mean 0 and variance \( n \). But this is also the distribution of \( \sqrt{n} Z \). Hence the standard normal distribution is strictly stable, with index \( \alpha = 2 \). The normal distribution with mean \( \mu \in \R \) and standard deviation \( \sigma \in (0, \infty) \) is the distribution of \( \mu + \sigma Z \). From our basic properties above, this distribution is stable with index \( \alpha = 2 \) and centering parameters \( \left(n - \sqrt{n}\right) \mu \) for \( n \in \N \).

In terms of the characteristic function, note that if \( \alpha = 2 \) then \( u_\alpha(t) = \tan(\pi) = 0 \) so the skewness parameter \( \beta \) drops out completely. The characteristic function in standard form \( \chi(t) = e^{-t^2} \) for \( t \in \R \), which is the characteristic function of the normal distribution with mean 0 and variance 2.

Of course, the normal distribution has finite variance, so once we know that it is stable, it follows from the finite variance property that the index must be 2. Moreover, the characteristic function shows that the normal distribution is the only stable distribution with index 2, and hence the only stable distribution with finite variance.

Open the special distribution simulator and select the normal distribution. Vary the parameters and note the shape and location of the probability density function. For various values of the parameters, run the simulation 1000 times and compare the empirical density function to the probability density function.

The Cauchy distribution is stable with index \( \alpha = 1 \) and skewness parameter \( \beta = 0 \).

Details:

Suppose that \(Z\) has the standard Cauchy distribiution. If \( n \in \N_+ \) and \( (Z_1, Z_2, \ldots, Z_n) \) is a sequence of independent copies of \(Z\), then \( Z_1 + Z_2 + \cdots + Z_n \) has the Cauchy distribution scale parameter \( n \). By definition this is the same as the distribution of \( n Z \). Hence the standard Cauchy distribution is strictly stable, with index \( \alpha = 1 \). The Cauchy distribution with location parameter \( a \in \R \) and scale parameter \( b \in (0, \infty) \) is the distribution of \( a + b Z \). From our basic properties above, this distribution is strictly stable with index \( \alpha = 1 \).

When \( \alpha = 1 \) and \( \beta = 0 \) the characteristic function in standard form is \( \chi(t) = \exp\left(-\left|t\right|\right) \) for \( t \in \R \), which is the characteristic function of the standard Cauchy distribution.

Open the special distribution simulator and select the Cauchy distribution. Vary the parameters and note the shape and location of the probability density function. For various values of the parameters, run the simulation 1000 times and compare the empirical density function to the probability density function.

The Lévy distribution is stable with index \( \alpha = \frac{1}{2} \) and skewness parameter \( \beta = 1 \).

Details:

If \( n \in \N_+ \) and \( (Z_1, Z_2, \ldots, Z_n) \) is a sequence of independent variables, each with the standard Lévy distribution, then \( Z_1 + Z_2 + \cdots + Z_n \) has the Lévy distribution scale parameter \( n^2 \). By definition this is the same as the distribution of \( n^2 Z \) where \( Z \) has the standard Lévy distribution. Hence the standard Lévy distribution is strictly stable, with index \( \alpha = \frac{1}{2} \). The Lévy distribution with location parameter \( a \in \R \) and scale parameter \( b \in (0, \infty) \) is the distribution of \( a + b Z \). From our basic properties above, this distribution is stable with index \( \alpha = \frac{1}{2} \) and centering parameters \( (n - n^2) a \) for \( n \in \N_+ \).

When \( \alpha = \frac{1}{2} \) note that \( u_\alpha(t) = \tan\left(\frac{\pi}{4}\right) = 1 \). So the characteristic function in standard form with \( \alpha = \frac{1}{2} \) and \( \beta = 1 \) is \[ \chi(t) = \exp\left(-\left|t\right|^{1/2}\left[1 + i \sgn(t)\right]\right) \] which is the characteristic function of the standard Lévy distribution.

Open the special distribution simulator and select the Lévy distribution. Vary the parameters and note the shape and location of the probability density function. For various values of the parameters, run the simulation 1000 times and compare the empirical density function to the probability density function.

The normal, Cauchy, and Lévy distributions are the only stable distributions for which the probability density function is known in closed form.