spca: An R package to Compute Least Squares Sparse Principal Components
Abstract
This paper introduces the R package spca, which provides a computational framework for least squares sparse principal component analysis (LS-SPCA). Unlike other SPCA methods, LS-SPCA generates uncorrelated sparse principal components (sPCs) that effectively maximize the explained variance while maintaining strong correlations with standard principal components (PCs). The framework also includes more computationally efficient variants that produce mildly correlated sPCs, which often have lower cardinality while explaining equal or greater variance than the LS-SPCA optimal sPCs. The spca package is built on an efficient C++ backend for matrix computations, with distinct engines for tall and fat matrices, and a flexible R frontend. The user interface offers several options for computing sPCs, such as deciding whether sparsification should stop when a threshold for cumulative variance explained or R2 with the PCs is reached, and choosing between simple forward selection, stepwise forward selection, or backward elimination for variable selection. In addition to the print(), summary(), and plot() methods, the package includes tools for comparing different "spca" solutions, grouping sparse loadings, and representing foreign SPCA solutions as "spca" objects. This article demonstrates with real datasets the use of the package in a typical LS-SPCA workflow and briefly contrasts LS-SPCA with conventional SPCA solutions . Then it compares different LS-SPCA solutions obtained from the dataset. Finally, the performance of spca on large tall and fat matrices is discussed, showing that spca offers a computationally efficient alternative for computing interpretable sPCs.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.