Memetic Differential Evolution Methods for Semi-Supervised Clustering
Abstract
In this paper, we propose an extension for semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems of MDEClust, a memetic framework based on the Differential Evolution paradigm for unsupervised clustering. In semi-supervised MSSC, background knowledge is available in the form of (instance-level) "must-link" and "cannot-link" constraints, each of which indicating if two dataset points should be associated to the same or to a different cluster, respectively. The presence of such constraints makes the problem at least as hard as its unsupervised version and, as a consequence, some framework operations need to be carefully designed to handle this additional complexity: for instance, it is no more true that each point is associated to its nearest cluster center. As far as we know, our new framework, called S-MDEClust, represents the first memetic methodology designed to generate a (hopefully) optimal feasible solution for semi-supervised MSSC problems. Results of thorough computational experiments on a set of well-known as well as synthetic datasets show the effectiveness and efficiency of our proposal.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.