Basic Theory
As usual, our starting point is a random experiment with an underlying sample space and a probability measure . In the basic statistical model, we have an observable random variable taking values in a set . In general, can have quite a complicated structure. For example, if the experiment is to sample objects from a population and record various measurements of interest, then
where is the vector of measurements for the th object.
Suppose also that the distribution of depends on a parameter with values in a set . The parameter may also be vector valued, in which case for some and the parameter has the form .
The Bayesian Formulation
Recall that in Bayesian analysis, named for the infamous Thomas Bayes, the unknown parameter is treated as the observed value of a random variable with values in . Here is a brief review:
The Bayesian formulation
- The conditional probability density function of the data vector given is denoted for .
- The random parameter is given a prior distribution with probability density function on .
- The joint probability density function of is for .
- The (unconditional) probability density function of is the function given by for if has a discrete distribution, or by for if has a continuous distribution.
- By Bayes' theorem, the posterior probability density function of given is
The prior distribution is often subjective, and is chosen to reflect our knowledge, if any, of the parameter. In some cases, we can recognize the posterior distribution from the functional form of without having to actually compute the normalizing constant , and thus reducing the computational burden significantly. In particular, this is often the case when we have a conjugate parametric family of distributions of . Recall that this means that when the prior distribution of belongs to the family, so does the posterior distribution of given .
The most important special case arises when we have a basic variable with values in a set , and given , the data vector is a random sample of size from . That is, given , is a sequence of independent, identically distributed variables, each with the same distribution as given . Thus and if has conditional probability density function , then
Confidence Sets
Now let be a confidence set (that is, a subset of the parameter set that depends on the data variable but no unknown parameters).
One possible definition of a level Bayesian confidence set requires that
In defintion [2], only is random and thus the probability above is computed using the posterior probability density function .
Another possible definition requires that
In definition [3], and are both random, and so the probability above would be computed using the joint probability density function . Whatever the philosophical arguments may be, definition [2] is certainly the easier one from a computational viewpoint, and hence is the one most commonly used.
Let us compare the classical and Bayesian approaches. In the classical approach, the parameter is deterministic, but unknown. Before the data are collected, the confidence set (which is random by virtue of ) will contain the parameter with probability . After the data are collected, the computed confidence set either contains or does not, and we will usually never know which. By contrast in a Bayesian confidence set, the random parameter falls in the computed, deterministic confidence set with probability .
Suppose that is real valued, so that . For , a level Bayesian confidence interval is where is the quantile of order for the posterior distribution of given .
As in past sections, is the fraction of in the right tail of the posterior distribution and is the fraction of in the left tail of the posterior distribution. As usual, gives the symmetric, two-sided confidence interval; letting gives the confidence lower bound; and letting gives the confidence upper bound.
Applications
The Bernoulli Distribution
Suppose that is a random sample of size from the Bernoulli distribution with unknown success parameter . In the usual language of reliability, means success on trial and means failure on trial . The distribution is named for Jacob Bernoulli. Recall that the Bernoulli distribution has probability density function (given )
Note that the number of successes in the trials is . Given , random variable has the binomial distribution with parameters and .
In our previous discussion of Bayesian estimation, we modeled the parameter with a random variable that has a beta distribution. This family of disstributions is conjugate for . Specifically, if the prior distribution of is beta with left parameter and right parameter , then the posterior distribution of given is beta with left parameter and right parameter ; the left parameter is increased by the number of successes and the right parameter by the number of failure. It follows that a level Bayesian confidence interval for is where is the quantile of order for the posterior beta distribution. In the special case the prior distribution is uniform on and reflects a lack of previous knowledge about .
Suppose that we have a coin with an unknown probability of heads, and that we give the uniform prior, reflecting our lack of knowledge about . We then toss the coin 50 times, observing 30 heads.
- Find the posterior distribution of given the data.
- Construct the 95% Bayesian confidence interval.
- Construct the classical Wald confidence interval at the 95% level.
Details:
- Beta with left parameter 31 and right parameter 21.
The Poisson Distribution
Suppose that is a random sample of size from the Poisson distribution with parameter . Recall that the Poisson distribution is often used to model the number of random points
in a region of time or space, particularly in the contest of the Poisson process. The distribution is named for the inimitable Simeon Poisson and given , has probability density function
As usual, we will denote the sum of the sample values by . Given , random variable also has a Poisson distribution, but with parameter .
In our previous discussion of Bayesian estimation, we showed modeled with a random variable that has a gamma distribution. This family of distributions is conjugate for . Specifically, if the prior distribution of is gamma with shape parameter and rate parameter (so that the scale parameter is ), then the posterior distribution of given is gamma with shape parameter and rate parameter . It follows that a level Bayesian confidence interval for is where is the quantile of order for the posterior gamma distribution.
Consider the alpha emissions data, which we believe come from a Poisson distribution with unknown parameter . Suppose that a priori, we believe that is about 5, so we give a prior gamma distribution with shape parameter and rate parameter 1. (Thus the mean is 5 and the standard deviation .)
- Find the posterior distribution of given the data.
- Construct the 95% Bayesian confidence interval.
- Construct the classical confidence interval at the 95% level.
Details:
- Gamma with shape parameter 10104 and rate parameter 1208.
The Normal Distribution
Suppose that is a random sample of size from the normal distribution with unknown mean and known variance . Of course, the normal distribution plays an especially important role in statistics, in part because of the central limit theorem. The normal distribution is widely used to model physical quantities subject to numerous small, random errors. Recall that the normal probability density function (given ) is
We denote the sum of the sample values by . Recall that also has a normal distribution (given ), but with mean and variance .
In our previous discussion of Bayesian estimation, we modeled with a random variable that also has a normal distribution. This family is conjugate for (with known). Specifically, if the prior distribution of is normal with mean and standard deviation , then the posterior distribution of given is also normal, with
It follows that a level Bayesian confidence interval for is where is the quantile of order for the posterior normal distribution. An interesting special case is when , so that the standard deviation of the prior distribution of is the same as the standard deviation of the sampling distribution. In this case, the posterior mean is and the posterior variance is
The length of a certain machined part is supposed to be 10 centimeters but due to imperfections in the manufacturing process, the actual length is a normally distributed with mean and variance . The variance is due to inherent factors in the process, which remain fairly stable over time. From historical data, it is known that . On the other hand, may be set by adjusting various parameters in the process and hence may change to an unknown value fairly frequently. Thus, suppose that we give with a prior normal distribution with mean 10 and standard deviation 0.03 A sample of 100 parts has mean 10.2.
- Find the posterior distribution of given the data.
- Construct the 95% Bayesian confidence interval.
- Construct the classical confidence interval at the 95% level.
Details:
- Normal with mean 10.198 and standard deviation 0.0299.
The Beta Distribution
Suppose that is a random sample of size from the beta distribution with unknown left shape parameter and right shape parameter . The beta distribution is widely used to model random proportions and probabilities and other variables that take values in bounded intervals. Recall that the probability density function (given ) is
We denote the product of the sample values by .
In our previous discussion of Bayesian estimation, we modeled with a random variable that has a gamma distribution. This family of distributions is conjugate for . Specifically, if the prior distribution of is gamma with shape parameter and rate parameter , then the posterior distribution of given is also gamma, with shape parameter and rate parameter . It follows that a level Bayesian confidence interval for is where is the quantile of order for the posterior gamma distribution. In the special case that , the prior distribution of is exponential with rate parameter .
Suppose that the resistance of an electrical component (in Ohms) has the beta distribution with unknown left parameter and right parameter . We believe that may be about 10, so we give the prior gamma distribution with shape parameter 10 and rate parameter 1. We sample 20 components and observe the data
- Find the posterior distribution of .
- Construct the 95% Bayesian confidence interval for .
Details:
- Gamma with shape parameter 30 and rate parameter 2.424.
The Pareto Distribution
Suppose that is a random sample of size from the Pareto distribution with shape parameter and scale parameter . The Pareto distribution is used to model certain financial variables and other variables with heavy-tailed distributions, and is named for Vilfredo Pareto. Recall that the probability density function (given ) is
We denote the product of the sample values by .
In our previous discussion of Bayesian estimation, we showed modeled with a random variable that has a gamma distribution. This family of distributions is conjugate for . Specifically, if the prior distribution of is gamma with shape parameter and rate parameter , then the posterior distribution of given is also gamma, with shape parameter and rate parameter . It follows that a level Bayesian confidence interval for is where is the quantile of order for the posterior gamma distribution. In the special case that , the prior distribution of is exponential with rate parameter .
Suppose that a financial variable has the Pareto distribution with unknown shape parameter and scale parameter . We believe that may be about 4, so we give the prior gamma distribution with shape parameter 4 and rate parameter 1. A random sample of size 20 from the variable gives the data
- Find the posterior distribution of .
- Construct the 95% Bayesian confidence interval for .
Details:
- Gamma with shape parameter 24 and rate parameter 5.223.