A technique for multivariate concentration. Let $(S_{t})$ be some stochastic process, say in $R_{d}$. For instance, $S_{n}=∑_{i=1}X_{i}$ for multivariate observations $X_{i}∈R_{d}$. We are aiming to generate a high probability bound on

$∣S_{n}∣=v:∣v∣=1sup ⟨v,S_{n}⟩.$The idea is to use a PAC-Bayes approach (which is itself based on variational inequalities, hence the name), in order to simultaneously bound $⟨v,S_{t}⟩$ in each direction $v$. Recall that a PAC-Bayes bound has the form

$P(∀ρ∈M(Θ):Something holds)≥1−δ.$That is, a PAC-Bayes bound provides a high probability bound *simultaneously* over all posteriors. The variational approach to concentration translates this into a high probability bound over all directions.

This approach was pioneered by Catoni and Giulini (here and here), and has now been used by a few authors to prove bounds in a variety of settings:

- Zhivotovskiy for bounding the singular values of random matrices
- Nakakita et al. for bounding the mean of high-dimensional random matrices under heavy-tails;
- Giulini for estimating the Gram operator in Hilbert spaces.
- Myself and others for estimating the mean of random vectors.

Recall a very general PAC-Bayes inequality:

## Master theorem

Let $M_{t}(θ)$ be a nonnegative supermartingale with initial value 1 for all $θ∈Θ$. Let $ν$ be a data-free prior. Then,

$P(∀t,∀ρ∈M(Θ):E_{ρ}gM_{t}(θ)≤D_{KL}(ρ∥ν)+g(1/δ))≥1−δ.$This is the time-uniform version of the master theorem, but we can also state a master fixed-time version. This reads: Let $N(θ)$ be nonnegative have expected value at most 1 (it is an e-value) for all $θ∈Θ$. Then

$P(∀ρ∈M(Θ):E_{ρ}gN(θ)≤D_{KL}(ρ∥ν)+g(1/δ))≥1−δ.$Link to original

To see how this works, let’s consider two use cases.

# Example 1: Sub-Gaussian random vectors

This comes from our paper on time-uniform confidence spheres. Consider $n$ iid copies $X_{1},…,X_{n}$ of a $Σ$-sub-Gaussian random vector $X∈R_{d}$ (see sub-Gaussian distributions). That is,

$Eexp(λ⟨θ,X⟩)≤exp(2λ_{2} ⟨θ,Σθ⟩),$for all $λ∈R$ and $θ∈R_{d}$. This implies that

$N(θ)=exp{λi≤n∑ ⟨θ,X_{i}⟩−2nλ_{2} ⟨θ,Σθ⟩},$has expectation at most 1 (i.e., it is an e-value). Let $ν$ be a Gaussian with mean 0 and covariance $β_{−1}I$ for some $β>0$. Consider the family of distributions ${ρ_{u}:∣u∣=1}$ where $ρ_{u}$ is a Gaussian with mean $u$ and covariance $β_{−1}I$. Then the KL divergence between $ρ_{u}$ and $ν$ is $D_{KL}(ρ_{u}∥ν)=β/2$. Using the master theorem above, we obtain that, with probability $1−δ$, *simultaneously for all distributions $ρ$*,

Now, for $ρ=ρ_{u}$, $E_{ρ}⟨θ,X_{i}⟩=⟨u,X_{i}⟩$ and

$E_{ρ}⟨θ,Σθ⟩=⟨u,Σu⟩+β_{−1}Tr(Σ)≤∣Σ∣+β_{−1}Tr(Σ),$using basic properties of the expectation of quadratic forms under Gaussian distributions (see eg here), and definition of the operator norm as $∣A∣=sup_{u,v:∣u∣=∣v∣=1}⟨u,Σv⟩$. Since this holds simultaneously for all $ρ_{u}$, we obtain that, with probability $1−δ$,

$usup λi≤n∑ ⟨u,X_{i}⟩≤2nλ_{2} (∣Σ∣+β_{−1}Tr(Σ))+2β +g(1/δ).$The left hand side is equal to $λ ∑_{i≤n}X_{i} $, which gives us our concentration result. One can then optimize $λ$ using some calculus.

# Example 2: Random matrices with finite Orlicz-norm

This example is adapted from Zhivotovskiy (2024). Let $M_{1},…,M_{n}$ be iid copies of a zero-mean random matrix $M$ with finite sub-exponential Orlicz norm, in the sense that, for some $C>0$,

$∣⟨θ,Mϕ⟩∣_{ψ_{1}}≤C⟨θ,Σϕ⟩,$for all $θ,ϕ∈R_{d}$ where $Q=EM$.

Here we highlight the main ingredients that are used in Zhivotovskiy’s results. For the details, see his paper or my blog post.

We take our parameter space in the master theorem above to be $Θ=R_{d}×R_{d}$. Let $ν$ again be Gaussian with mean 0 and covariance $β_{−1}Σ$ and let $μ_{u}$ be a *truncated* Gaussian mean $u$, covariance $β_{−1}Σ$ and radius $r$. For a vector $u∈Σ_{1/2}S_{d−1}$, the KL-divergence between a truncated normal $μ_{u}$ and $ν$ is $g(1/Z)+β/2$, where $Z=P(∣θ−u∣≤r)$ where $θ∼ρ_{u}$. Equivalently, $Z=P(∣Y∣≤r)$ where $Y$ is a normal with mean $0$ and covariance $β_{−1}Σ$. Hence $1−Z=P(∣Y∣>r)≤E∣Y∣_{2}/r_{2}=β_{−1}Tr(Σ)/r_{2}$. Thus, taking $r=2β_{−1}Tr(Σ) $ yields $Z≥1/2$, and we obtain

Now it remains to construct a relevant quantity to use in the PAC-Bayes theorem. Consider

$N(θ,ϕ)=exp{λi≤n∑ ⟨θ,Σ_{−1/2}M_{i}Σ_{−1/2}ϕ⟩−ngEexp(λ⟨θ,Σ_{−1/2}MΣ_{−1/2}ϕ⟩)},$where the expectation is over $M$. It’s easy to see this has expectation at most 1 (it can be written as the product of terms each with expectation exactly one). Apply the master theorem with the product distribution $μ_{u}×μ_{v}$ for $u,v$ in the unit sphere. Then, to bound the right hand side of the process, use one of the exponential inequalities. In particular, for a random variable $Y$,

$E[exp(λ(Y−EY))]≤exp(4λ_{2}∣Y−EY∣_{ψ_{1}}),∀∣λ∣≤2∣Y−EY∣_{ψ_{1}}1 .$A bunch of algebra then gives the result.