Suppose we want to do parametric density estimation but our parametric class is a set of complicated functions which may not be proper probability distributions. For instance, think of deep neural networks.
If is a well-behaved family of probability distributions (i.e., integrate to 1, non-negative) then a natural approach to density estimation is MLE. Thus, one approach is to normalize as
The log-likelihood is then
In order to maximize this, we must solve (or approximate) the integral ,which is highly non-trivial for complex function families.
There are several ways around this problem. variational autoencoders are one practical and useful method. #todo sort out how variational inference relates to this.