Deep Density Estimation

Suppose we want to do parametric density estimation but our parametric class is a set of complicated functions ${f_{θ} : θ \in Θ}$ which may not be proper probability distributions. For instance, think of deep neural networks.

One approach is to normalize $f_{θ}$ as

p_{θ} (x) := \frac{exp ( f _{θ} ( x ))}{\int _{X} exp f _{θ} ( x ) d x} .

The log-likelihood is then

lo g L (θ) = i \sum f_{θ} (X_{i}) - n lo g \int_{X} exp f_{θ} (x) d x .

In order to maximize this, we must solve (or approximate) the integral $lo g \int_{X} exp f_{θ}$ ,which is highly non-trivial for complex function families. This is the problem of approximate inference in deep learning, which can be solved by, eg variational inference.

Energy-based models have a similar idea, but they define the probability distribution as $p_{θ} (x) \propto exp (- f_{θ} (x))$ . This also requires computing the normalizing constant as above. These are called energy based because $f_{θ}$ is a neural net that represents the “energy,” borrowing from statistical physics where lower energy is synonymous with higher probability.

There are other approaches, such as normalizing flows, autoregressive models, and GANs.

The Stats Map

Explore

deep density estimation

Graph View

Backlinks

Explore