Nonparametric Clustering of Mixed Data Using Modified Chi-square Tests

Abstract

We propose a non-parametric method to cluster mixed data containing both continuous and discrete random variables. The product space of continuous and categorical sample spaces is approximated locally by analyzing neighborhoods with cluster patterns. Detection of cluster patterns on the product space is determined by using a modified Chi-square test. The proposed method does not impose a global distance function which could be difficult to specify in practice. Results from simulation studies have shown that our proposed methods out-performed the benchmark method, AutoClass, for various settings.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…