This is an old revision of the document!
Bayesian statistics
The Bayesian influence is using a likelihood function $L_n(\theta)$ that is weighted by prior knowledge.
The Bayesian approach is using sample data to update prior beliefs, forming posterior beliefs. To do this, we model the parameter as a random variable, even though it is not.
The prior distribution is the distribution of the parameter “random variable.” The posterior distribution is the distribution of the parameter “random variable” given sample data.
Conjugate prior
A prior distribution is the conjugate to the data model if the posterior model is in the same distribution family as the prior model. Having a more general prior and more specific likelihood model makes the prior more likely to be a conjugate prior. Some examples of conjugate models to data models are:
- Gamma prior with exponential data model
- Beta prior with Bernoulli data model
- Gaussian prior with Gaussian data model
Setup of Bayesian statistics problem
$\pi(\cdot)$: prior distribution. It could be uniform, exponential, Gaussian, etc.
$X_1, \ldots, X_n$: sample of $n$ random variables
$L_n(\cdot | \theta)$: joint pdf of $X_1, \ldots, X_n$ conditionally on $\theta$, where $\theta \sim \pi$. It is equal to the likelihood from the frequentist approach.
Applying Bayes' formula, we have:
$$\pi(\theta|X_1, \ldots, X_n) \propto L_n(X_1, \ldots, X_n|\theta) \pi(\theta)$$
$$\pi(\theta|X_1, \ldots, X_n) = \frac{L_n(X_1, \ldots, X_n|\theta) \pi(\theta)}{\int_\Theta L_n(X_1, \ldots, X_n|\theta) \pi(\theta)d\theta}$$
From this updated PDF, we can extract the new parameters (hyperparameters) of the distribution of the parameter.
Bernoulli experiment with Beta prior
Let $X_i \sim {\rm Ber}(\theta)$.
Select a Beta prior for the parameter $\theta$. That is, $\pi(\theta) \sim {\rm Beta}(a, b)$
First, calculate the joint pdf, or the likelihood function.
$$L_n(X_1, \ldots, X_n | \theta) = p_n(X_1, \ldots, X_n | \theta) = \theta^{\sum_{i=1}^n X_i} (1-\theta)^{n-\sum_{i=1}^n X_i}$$
Then, update the distribution.
$$\pi(\theta|X_1, \ldots, X_n) \propto L_n(X_1, \ldots, X_n | \theta) \pi(\theta) $$ $$= \theta^{a-1}(1-\theta)^{b-1}\theta^{\sum_{i=1}^n X_i} (1-\theta)^{n-\sum_{i=1}^n X_i}$$ $$= \theta^{a+\sum_{i=1}^n X_i-1}(1-\theta)^{b+n-\sum_{i=1}^n X_i-1}$$
So the new parameters (for the Beta distribution describing the parameter as a random variable) are:
$$a' = a+\sum_{i=1}^n X_i$$ $$b' = b+n-\sum_{i=1}^n X_i$$
Noninformative prior
If we have no prior information about the parameter, we can choose a prior with constant pdf on $\Theta$.
- If $\Theta$ is bounded, the distribution is uniform on $\Theta$.
- If $\Theta$ is unbounded, the prior is an improper prior. Formally, $\pi(\theta) \equiv 1$.
- In general, a prior is improper iff $\int \pi(\theta) d\theta = \infty$.
- Bayes' formula still works.
Bayesian confidence region
A Bayesian confidence region with level $\alpha$ is a random subset $\mathcal{R}$ of $\Theta$ such that:
$$\mathbb{P}[\theta \in \mathcal{R} | X_1, \ldots, X_n] = 1 - \alpha$$
The randomness comes from the prior distribution.
Bayesian estimation
One Bayes estimator is the posterior mean:
$$\hat{\theta}^{(\pi)}=\int_\Theta \theta \pi(\theta | X_1, \ldots, X_n) d\theta$$
Another estimator is the point that maximizes the posterior distribution, called the MAP (maximum a posteriori):
$$\hat{\theta}^{\rm MAP} = {\rm argmax}_{\theta \in \Theta} \pi(\theta | X_1, \ldots, X_2) = L_n (X_1, \ldots, X_n | \theta) \pi(\theta)$$