Multivariate Heavy-Tailed Mean Estimation

As in heavy-tailed concentration, the sample mean $\overline{X}_{n}$ is too easily influenced by outliers and does have good concentration properties. Also as in the scalar setting, the polynomial rate (in terms of the confidence level, $δ$ ) given by Chebyshev’s inequality is the best one can hope for when working with the sample mean. (See the discussion in Hopkins or by Catoni, section 6).

What can we hope for in the multivariate setting? In $R^{d}$ , the empirical mean of the Gaussian behaves as

P (μ - μ ∣ \geq O (\frac{Tr ( Σ )}{n} + \frac{∥Σ∥ lo g ( 1/ δ )}{n})) \leq δ .

This is called sub-Gaussian performance of an estimator, and is what we’re trying to achieve even in heavy-tailed settings. See the recent survey by Lugosi and Mendelson for more.

Approaches include:

median-of-means
truncation-based estimators
Catoni-Giulini M-estimator

The Stats Map

Explore

multivariate heavy-tailed mean estimation

Graph View

Backlinks

Explore