Quantization/Clustering: when and why does k-means work?
RésuméThough mostly used as a clustering algorithm, k-means is originally designed as a quantization algorithm. Namely, it aims at providing a compression of a probability distribution with k points. Building upon Levrard (2015); Tang and Monteleoni (2016), we try to investigate how and when these two approaches are compatible. Namely, we show that provided the sample distribution satisfies a margin like condition (in the sense of Mammen and Tsybakov, 1999 for supervised learning), both the associated empirical risk minimizer and the output of Lloyd’s algorithm provide almost optimal classification in certain cases (in the sense of Azizyan et al., 2013). Besides, we also show that they achieved fast and optimal convergence rates in terms of sample size and compression risk.
Clément Levrard est lauréat du prix Marie-Jeanne Laurent Duhamel 2017.