Minimal penalties and the slope heuristics: a survey
Abstract
Birgé and Massart proposed in 2001 the slope heuristics as a way to choose optimally from data an unknown
multiplicative constant in front of a penalty. It is built upon the notion of minimal penalty, and it has been generalized
since to some “minimal-penalty algorithms”. This article reviews the theoretical results obtained for such algorithms,
with a self-contained proof in the simplest framework, precise proof ideas for further generalizations, and a few new
results. Explicit connections are made with residual-variance estimators —with an original contribution on this topic,
showing that for this task the slope heuristics performs almost as well as a residual-based estimator with the best
model choice— and some classical algorithms such as L-curve or elbow heuristics, Mallows’ C p , and Akaike’s FPE.
Practical issues are also addressed, including two new practical definitions of minimal-penalty algorithms that are
compared on synthetic data to previously-proposed definitions. Finally, several conjectures and open problems are
suggested as future research directions.