A general approach to the problem of approximate inference in deep learning in Bayesian statistics It’s faster, though typically less accurate, than MCMC.

VI solves this via an optimization problem. We estimate with

where is some tractable set of densities over . Of course, this objective is also intractable since it depends on , which is what is difficult to compute in the first place. Instead we define

and write

So, minimizing the KL divergence is equivalent to maximizing up to a constant which doesn’t depend on . The objective is called the “evidence lower bound”, or ELBO. This name comes from the fact that

so is a lower bound for the log-evidence.