todo concentration inequalities
- Abbasi-Yadkori et al provide possibly one of the most famous self-normalized bounds, which they apply to contextual bandits.
- Ziemann shows how the variational approach to concentration (based on PAC-Bayes) can be used to recover the Abbasi-Yadkori bound, and also give a Bernstein-style self-normalized bound if boundedness assumptions are imposed.
- Whitehouse et al give a general framework for deriving self-normalized bounds for sub-psi processes in . Instead of the bounds relying on the determinant of the variance process, their bounds depend on its eigenvalues. Neither is uniformly better or worse.
All of the above bounds are valid at stopping times; they are time-uniform.