Subset of statistical decision theory. We have a loss function where is some parameter space (each corresponds to some distribution ) and is an action set (often ). We have a prior over , as one does in Bayesian statistics.

The Bayes risk of a rule which takes an action given data is

We call a Bayes estimator if it minimizes the Bayes risk, i.e., .

Another view on optimality and the Bayes estimator is via the posterior expected loss. Given data , take the action which minimizes expected loss under the posterior:

Then is the Bayes estimator. This is a huge result in Bayesian decision theory, and can be traced back to Wald, Blackwell, and Lehmann and Casella.

The proof is straightforward. Let be the expected posterior loss when playing . Then write

by Fubini’s theorem. If we minimize for each (which is what minimizing the posterior loss does), then we minimize .