An approach to heavy-tailed scalar concentration and multivariate heavy-tailed concentration.

The overall idea is straightforward: Split the data into $k$ buckets and compute the sample mean $μ _{i}$ of each bucket. Then the overall estimator is

$μ =Median(μ _{1},…,μ _{k}).$There are several questions to answer to implement this in practice. First, how to choose $k$? Second (especially relevant in multivariate settings), how do we define the median?

## Scalar case

With probability at least $3/4$ by Chebyshev, $μ _{i}$ is not too far from the true mean. So for the median to be far from the mean, many (at least half) independent Bernoulli events with probability 3/4 must fail to occur.

If we choose $k=⌈8g(1/δ)⌉$ then the median-of-means estimator $μ $ satisfies

$P(∣μ −μ∣≥σn32g(1/δ) )≤δ.$The MoM estimator can also be extended to situations where the distribution has no variance. If $E∣X−μ∣_{1+ϵ}≤v$, then

$P(∣μ −μ∣≥(12v)_{1+ϵ1}(n8g(1/δ) )_{1+ϵϵ})≤δ.$This was originally proved by Bubeck, Cesa-Bianchi, and Lugosi in the content of heavy-tailed bandits. I have a post with the proofs here.

You can sequentialize the MoM estimator using the Dubins-Savage inequality. I have a post about that here.

## Finite-dimensional vector case

Defining the median in $R_{d}$ is slightly trickier.

One option is the geometric median, which is studied by Minsker (see below). This achieves rate $Tr(Σ)g(1/δ)/n $, which is not quite sub-Gaussian. However, Minsker’s approach has the added benefit that it does not require finite variance. Instead, it’s a general recipe for boosting weak learners into strong ones. So even if you only have a finite $p$-th moment for $p∈(1,2)$, you could use Markov’s inequality to obtain a bound on the empirical mean (say) and apply geometric MoM.

Lugosi and Mendelson define a specialized version of the median and get sub-Gaussian rates when the norm is the Euclidean norm and the variance exists. This is sometimes called median-of-means tournament. I have a blog post about this here. Hopkins made this computationally tractable in 2020. In 2019, Lugosi and Mendelson also studied median-of-means under general norms in $R_{d}$ and obtain sub-Gaussian rates. However, the estimator is computationally inefficient and I don’t think anyone has figured out how to improve it.

## Infinite-dimensional case

MoM with the geometric median was studied in Banach spaces by Minsker and in Polish spaces by Yun and Park:

## Heavy-tailed concentration

In 2015, Minsker proposed the geometric median-of-means for Banach spaces. This is a general method for boosting weak (polynomial rate) estimators into an estimator with exponential rate. The idea is similar to the Lugosi-Mendelson median-of-means (see multivariate heavy-tailed concentration), but the weak estimators are aggregated using the geometric median. This can be computed in polynomial time (in $R_{d}$) using Weisfeld’s algorithm, since the objective is convex. This estimator was proposed simultaneously by Hsu and Sabato.

In 2022, Yun and Park extended geometric median-of-means to Polish spaces, which include separable Banach spaces. They seem to get the same rates as Minsker, which are not quite sub-Gaussian in $R_{d}$.

Link to original