A technique for multivariate concentration. Let be some stochastic process, say in . For instance, for multivariate observations . We are aiming to generate a high probability bound on

The idea is to use a PAC-Bayes approach (which is itself based on variational inequalities, hence the name), in order to simultaneously bound in each direction . Recall that a PAC-Bayes bound has the form

That is, a PAC-Bayes bound provides a high probability bound simultaneously over all posteriors. The variational approach to concentration translates this into a high probability bound over all directions.

This approach was pioneered by Catoni and Giulini (here and here), and has now been used by a few authors to prove bounds in a variety of settings:

Recall a very general PAC-Bayes inequality:

Master theorem

Let be a nonnegative supermartingale with initial value 1 for all . Let be a data-free prior. Then,

This is the time-uniform version of the master theorem, but we can also state a master fixed-time version. This reads: Let be nonnegative have expected value at most 1 (it is an e-value) for all . Then

Link to original

To see how this works, let’s consider two use cases.

Example 1: Sub-Gaussian random vectors

This comes from our paper on time-uniform confidence spheres. Consider iid copies of a -sub-Gaussian random vector (see sub-Gaussian distributions). That is,

for all and . This implies that

has expectation at most 1 (i.e., it is an e-value). Let be a Gaussian with mean 0 and covariance for some . Consider the family of distributions where is a Gaussian with mean and covariance . Then the KL divergence between and is . Using the master theorem above, we obtain that, with probability , simultaneously for all distributions ,

Now, for , and

using basic properties of the expectation of quadratic forms under Gaussian distributions (see eg here), and definition of the operator norm as . Since this holds simultaneously for all , we obtain that, with probability ,

The left hand side is equal to , which gives us our concentration result. One can then optimize using some calculus.

Example 2: Random matrices with finite Orlicz-norm

This example is adapted from Zhivotovskiy (2024). Let be iid copies of a zero-mean random matrix with finite sub-exponential Orlicz norm, in the sense that, for some ,

for all where .

Here we highlight the main ingredients that are used in Zhivotovskiy’s results. For the details, see his paper or my blog post.

We take our parameter space in the master theorem above to be . Let again be Gaussian with mean 0 and covariance and let be a truncated Gaussian mean , covariance and radius . For a vector , the KL-divergence between a truncated normal and is , where where . Equivalently, where is a normal with mean and covariance . Hence . Thus, taking yields , and we obtain

Now it remains to construct a relevant quantity to use in the PAC-Bayes theorem. Consider

where the expectation is over . It’s easy to see this has expectation at most 1 (it can be written as the product of terms each with expectation exactly one). Apply the master theorem with the product distribution for in the unit sphere. Then, to bound the right hand side of the process, use one of the exponential inequalities. In particular, for a random variable ,

A bunch of algebra then gives the result.