Clustering, Coding, and the Concept of Similarity
Abstract
This paper develops a theory of clustering and coding which combines a geometric model with a probabilistic model in a principled way. The geometric model is a Riemannian manifold with a Riemannian metric, gij( x), which we interpret as a measure of dissimilarity. The probabilistic model consists of a stochastic process with an invariant probability measure which matches the density of the sample input data. The link between the two models is a potential function, U( x), and its gradient, ∇ U( x). We use the gradient to define the dissimilarity metric, which guarantees that our measure of dissimilarity will depend on the probability measure. Finally, we use the dissimilarity metric to define a coordinate system on the embedded Riemannian manifold, which gives us a low-dimensional encoding of our original data.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.