International Journal of applied mathematics and computer science

online read us now

Paper details

Number 1 - March 2024
Volume 34 - 2024

Wide gaps and Kleinberg’s clustering axioms for k-means

Mieczysław A. Kłopotek

Abstract
The widely applied k-means algorithm produces clusterings that violate our expectations with respect to high/low similarity/density within/between clusters and is in conflict with Kleinberg’s axiomatic system for distance based clustering algorithms that formalizes those expectations. In particular, k-means violates the consistency axiom. We hypothesise that this clash is due to the unexplained expectation that the data themselves should have the property of being clusterable in order to expect the algorithm clustering them to fit a clustering axiomatic system. To demonstrate this, we introduce two new clusterability properties, i.e., variational k-separability and residual k-separability, and show that then Kleinberg’s consistency axiom holds for k-means operating in the Euclidean or non-Euclidean space. Furthermore, we propose extensions of the k-means algorithm that fit approximately Kleinberg’s richness axiom, which does not hold for k-means. In this way, we reconcile k-means with Kleinberg’s axiomatic framework in Euclidean and non-Euclidean settings. Besides contribution to the theory of axiomatic frameworks of clustering and to clusterability theory, the practical benefit is the possibility to construct datasets for testing purposes of algorithms optimizing the k-means cost function. This includes a method of construction of clusterable data with a global optimum known in advance.

Keywords
clustering theory, clustering axioms, clusterability

DOI
10.61822/amcs-2024-0010