An approach to heavy-tailed scalar concentration and multivariate heavy-tailed concentration.

The overall idea is straightforward: Split the data into buckets and compute the sample mean of each bucket. Then the overall estimator is

There are several questions to answer to implement this in practice. First, how to choose ? Second (especially relevant in multivariate settings), how do we define the median?

Scalar case

With probability at least by Chebyshev, is not too far from the true mean. So for the median to be far from the mean, many (at least half) independent Bernoulli events with probability 3/4 must fail to occur.

If we choose then the median-of-means estimator satisfies

The MoM estimator can also be extended to situations where the distribution has no variance. If , then

This was originally proved by Bubeck, Cesa-Bianchi, and Lugosi in the content of heavy-tailed bandits. I have a post with the proofs here.

You can sequentialize the MoM estimator using the Dubins-Savage inequality. I have a post about that here.

Finite-dimensional vector case

Defining the median in is slightly trickier.

One option is the geometric median, which is studied by Minsker (see below). This achieves rate , which is not quite sub-Gaussian.

Lugosi and Mendelson define a specialized version of the median and get sub-Gaussian rates. Blog post here. Hopkins made this computationally tractable in 2020.

Unlike the scalar setting, MoM in has not been extended to settings where the variance doesn’t exist.

Infinite-dimensional case

MoM with the geometric median was studied in Banach spaces by Minsker and in Polish spaces by Yun and Park:

Heavy-tailed concentration

In 2015, Minsker proposed the geometric median-of-means for Banach spaces. This is a general method for boosting weak (polynomial rate) estimators into an estimator with exponential rate. The idea is similar to the Lugosi-Mendelson median-of-means (see multivariate heavy-tailed concentration), but the weak estimators are aggregated using the geometric median. This can be computed in polynomial time (in ) using Weisfeld’s algorithm, since the objective is convex. This estimator was proposed simultaneously by Hsu and Sabato.

In 2022, Yun and Park extended geometric median-of-means to Polish spaces, which include separable Banach spaces. They seem to get the same rates as Minsker, which are not quite sub-Gaussian in .

Link to original