As in heavy-tailed concentration, the sample mean is too easily influenced by outliers and does have good concentration properties. Also as in the scalar setting, the polynomial rate (in terms of the confidence level, ) given by Chebyshev’s inequality is the best one can hope for when working with the sample mean. (See the discussion in Hopkins or by Catoni, section 6).

What can we hope for in the multivariate setting? In , the empirical mean of the Gaussian behaves as

This is called sub-Gaussian performance of an estimator, and is what we’re trying to achieve even in heavy-tailed settings. See the recent survey by Lugosi and Mendelson for more.

Approaches include: