An approach to heavy-tailed concentration and multivariate heavy-tailed mean estimation.
The overall idea is straightforward: Split the data into buckets and compute the sample mean of each bucket. Then the overall estimator is
There are several questions to answer to implement this in practice. First, how to choose ? Second (especially relevant in multivariate settings), how do we define the median?
Scalar case
With probability at least by Chebyshev, is not too far from the true mean. So for the median to be far from the mean, many (at least half) independent Bernoulli events with probability 3/4 must fail to occur.
If we choose then the median-of-means estimator satisfies
The MoM estimator can also be extended to situations where the distribution has no variance. If , then
This was originally proved by Bubeck, Cesa-Bianchi, and Lugosi in the content of heavy-tailed bandits. I have a post with the proofs here.
You can sequentialize the MoM estimator using the Dubins-Savage inequality. I have a post about that here.
Finite-dimensional vector case
Defining the median in is slightly trickier.
One option is the geometric median, which is studied by Minsker (see below). This achieves rate , which is not quite sub-Gaussian. However, Minsker’s approach has the added benefit that it does not require finite variance. Instead, it’s a general recipe for boosting weak learners into strong ones. So even if you only have a finite -th moment for , you could use Markov’s inequality to obtain a bound on the empirical mean (say) and apply geometric MoM.
Lugosi and Mendelson define a specialized version of the median and get sub-Gaussian rates when the norm is the Euclidean norm and the variance exists. This is sometimes called median-of-means tournament. I have a blog post about this here. Hopkins made this computationally tractable in 2020. In 2019, Lugosi and Mendelson also studied median-of-means under general norms in and obtain sub-Gaussian rates. However, the estimator is computationally inefficient and I don’t think anyone has figured out how to improve it.
Infinite-dimensional case
MoM with the geometric median was studied in Banach spaces by Minsker and in Polish spaces by Yun and Park:
Heavy-tailed concentration
In 2015, Minsker proposed the geometric median-of-means for Banach spaces. This is a general method for boosting weak (polynomial rate) estimators into an estimator with exponential rate. The idea is similar to the Lugosi-Mendelson median-of-means (see multivariate heavy-tailed mean estimation), but the weak estimators are aggregated using the geometric median. This can be computed in polynomial time (in ) using Weisfeld’s algorithm, since the objective is convex. This estimator was proposed simultaneously by Hsu and Sabato.
In 2022, Yun and Park extended geometric median-of-means to Polish spaces, which include separable Banach spaces. They seem to get the same rates as Minsker, which are not quite sub-Gaussian in .
Whitehouse et al. (2024) propose extending Catoni and Giulini’s truncation-based estimator (truncation-based estimators) to Banach spaces. This estimator can handle infinite variance.
Link to original