Unlike light-tailed settings, the sample mean is not well-behaved in heavy-tailed settings. Since heavy-tailed distributions may not have finite MGFs, the Chernoff method is not applicable. Catoni gives an example demonstrating the bound achieved via Markov’s inequality (basic inequalities), i.e.,

$P(∣X_{n}−μ∣)≥nδ σ )≤δ,$is essentially tight (where we receive observations $X_{1},…,X_{n}$ and $X_{n}$ is the sample mean). The issue is that outliers can have devastating effects on the sample mean, and heavy-tailed distributions can have many extreme observations. See this discussion by Lugosi and Mendelson for more details, or Sub-Gaussian mean estimators by Devroye et al.

Ideally we want estimators with sub-Gaussian like behavior, i.e.,

$P(∣μ −μ∣≳σng(1/δ) )≤δ.$This is an exponential improvement in the dependence on $1/δ$. These are called sub-Gaussian estimators.

There are several approaches to heavy-tailed mean estimation in scalar settings:

- median-of-means
- trimmed-mean (below)
- Catoni-Giulini M-estimator

## Trimmed mean estimator

Since the empirical mean is ruined by outliers, a natural idea is to remove some fraction of the points and then compute the empirical mean on the remaining points. These are known as trimmed mean estimators.

Oliveira and Orenstein prove that if you remove $γn$ of the largest and smallest points for $γ=Θ(g(1/δ)/n)$, then the trimmed mean is sub-Gaussian. See here for a discussion.

A related idea is proposed by Lee and Valiant who don’t remove entire points but instead weighted fractions of each observations:

At the highest level: in order to return a $δ$-robust estimate of the mean, our estimator “throws out the $31 g(1/δ)$ most extreme points in the samples”, and returns the mean of what remains. More specifically, outliers are thrown out in a weighted manner, where we throw out a fraction of each data point, with the fraction proportional to the square of its distance from a median-of-means initial guess for the mean, where the fraction is capped at 1, and the proportionality constant is chosen so that the total weight thrown out equals exactly $31 g(1/δ)$.

Their estimator achieves a bound of $σ(1+o(1))2g(1/δ)/n $.