As in heavy-tailed scalar concentration, the sample mean is too easily influenced by outliers and does have good concentration properties. Also as in the scalar setting, the polynomial rate (in terms of the confidence level, ) given by Chebyshev’s inequality is the best one can hope for when working with the sample mean. (See the discussion in Hopkins or by Catoni, section 6).

What can we hope for in the multivariate setting? In , the empirical mean of the Gaussian behaves as

This is called sub-Gaussian performance of an estimator, and is what we’re trying to achieve even in heavy-tailed settings. See the recent survey by Lugosi and Mendelson for more.

Approaches include:

Truncation-based estimators

In 2018, Catoni and Giulini proposed a simple estimator in : Simply truncate the observations and take the empirical mean of the result. In particular, they consider

where

for some . Estimators of this type are also called thresholding estimators. The Catoni-Giulini estimator.

This has a rate of where . The rate is therefore not quite sub-Gaussian, but close. See the discussion by Lugosi and Mendelson (Section 3.3) for a clear exposition about the precise rates of the Catoni-Giulini estimator. We gave a sequential version of this estimator. The guarantees for the original Catoni-Giulini estimator rely on proofs using PAC-Bayes. Blog post on this estimator is here.

The Catoni-Giulini truncation estimator requires a bound on the raw second moment . This is less desirable than a central moment assumption, i.e., a bound on , since the latter is immune to translations of the data. As noted by Lugosi and Mendelson, this translation invariance can be overcome by sample splitting: use some logarithmic number of points to construct a naive estimate of the mean, and then center the Catoni-Giulini estimator around .