Subjects classification from high-dimensional and small-sample size datasets using a strategy based on Clustering Variables around Latent Components (CLV) method
Abstract
High-dimensional complex systems can be studied through multivariate analysis, as Principal Component Analysis, however large samples of observations frequently are needed for it. Here it is examined a method for small samples based on clustering variables around latent variables (CLV) to subject classification in two presumed groups. For it, a predictive model was developed to generate datasets with two groups of cases whose variables show randomness features (up to 30% of variables manifest difference between groups, and up to 7% of those are correlated between them). The method recovered the information of the latent factors to classify the subjects with 80 to 95% of agreement, with positive relationship between the classifier precision and the rate [number of variables / number of subjects].
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.