A comparison of Gap statistic definitions with and without logarithm function

Abstract

The Gap statistic is a standard method for determining the number of clusters in a set of data. The Gap statistic standardizes the graph of (Wk), where Wk is the within-cluster dispersion, by comparing it to its expectation under an appropriate null reference distribution of the data. We suggest to use Wk instead of (Wk), and to compare it to the expectation of Wk under a null reference distribution. In fact, whenever a number fulfills the original Gap statistic inequality, this number also fulfills the inequality of a Gap statistic using Wk, but not vice versa. The two definitions of the Gap function are evaluated on several simulated data sets and on a real data of DCE-MR images.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…