Variational Inference

A general approach to the problem of approximate inference in deep learning in Bayesian statistics It’s faster, though typically less accurate, than MCMC.

VI solves this via an optimization problem. We estimate $p (z ∣ x)$ with

q^{*} (z) = argmin_{q (z) \in Q} D_{KL} (q (z) ∥ p (z ∣ x)),

where $Q$ is some tractable set of densities over $z$ . Of course, this objective is also intractable since it depends on $p (x)$ , which is what is difficult to compute in the first place. Instead we define

ℓ (q) := E lo g p (z, x) - E lo g q (z),

and write

D_{KL} (q (z) ∥ p (z ∣ x)) = E [lo g q (z)] - E [lo g p (z, x)] + lo g p (x) = lo g p (x) - ℓ (q) .

So, minimizing the KL divergence is equivalent to maximizing $ℓ$ up to a constant which doesn’t depend on $z$ . The objective $ℓ$ is called the “evidence lower bound”, or ELBO. This name comes from the fact that

lo g p (x) = D_{KL} (q (z) ∥ p (z ∣ x)) + ℓ (q) \geq ℓ (q),

so $ℓ (q)$ is a lower bound for the log-evidence.

The Stats Map

Explore

variational inference

Graph View

Backlinks

Explore