Bayesian Decision Theory

Subset of statistical decision theory. We have a loss function $L : Θ \times A \to R_{\geq 0}$ where $Θ$ is some parameter space (each $θ$ corresponds to some distribution $P_{θ}$ ) and $A$ is an action set (often $A = Θ$ ). We have a prior $π$ over $Θ$ , as one does in Bayesian statistics.

The Bayes risk of a rule $δ = δ (X)$ which takes an action given data is

B_{π} (δ) = E_{θ \sim π} E_{X \sim P_{θ}} L (θ, δ (X)) .

We call $δ^{*}$ a Bayes estimator if it minimizes the Bayes risk, i.e., $B_{π} (δ^{*}) = in f_{δ} B_{π} (δ)$ .

Another view on optimality and the Bayes estimator is via the posterior expected loss. Given data $X$ , take the action which minimizes expected loss under the posterior:

δ_{π} (X) = argmin_{a \in A} \int L (θ, a) π (θ ∣ X) d θ .

Then $δ_{π}$ is the Bayes estimator. This is a huge result in Bayesian decision theory, and can be traced back to Wald, Blackwell, and Lehmann and Casella.

The proof is straightforward. Let $ℓ (δ (X) ∣ X) = \int L (θ, δ (X)) π (θ ∣ X) d θ$ be the expected posterior loss when playing $δ (X)$ . Then write

B_{π} (θ) = \int_{Θ} \int_{X} L (θ, δ (x)) p (x ∣ θ) π (θ) d θ = \int_{Θ} \int_{X} L (θ, δ (x)) π (θ ∣ x) p (x) d θ = \int_{X} ℓ (δ (x) ∣ x) p (x) d x,

by Fubini’s theorem. If we minimize $ℓ (δ (x) ∣ x)$ for each $x$ (which is what minimizing the posterior loss does), then we minimize $B_{π} (θ)$ .

The Stats Map

Explore

Bayesian decision theory

Graph View

Backlinks

Explore