Partial Correlation Screening for Estimating Large Precision Matrices, with Applications to Classification

Abstract

We propose Partial Correlation Screening (PCS) as a new row-by-row approach to estimating a large precision matrix . To estimate the i-th row of , 1 ≤ i ≤ p, PCS uses a Screen step and a Clean step. In the Screen step, PCS recruits a (small) subset of indices using a stage-wise algorithm, where in each stage, the algorithm updates the set of recruited indices by adding the index j that has the largest (in magnitude) empirical partial correlation with i. In the Clean step, PCS re-investigates all recruited indices and use them to reconstruct the i-th row of . PCS is computationally efficient and modest in memory use: to estimate a row of , it only needs a few rows (determined sequentially) of the empirical covariance matrix. This enables PCS to execute the estimation of a large precision matrix (e.g., p=10K) in a few minutes, and open doors to estimating much larger precision matrices. We use PCS for classification. Higher Criticism Thresholding (HCT) is a recent classifier that enjoys optimality, but to exploit its full potential in practice, one needs a good estimate of the precision matrix . Combining HCT with any approach to estimating gives a new classifier: examples include HCT-PCS and HCT-glasso. We have applied HCT-PCS to two large microarray data sets (p = 8K and 10K) for classification, where it not only significantly outperforms HCT-glasso, but also is competitive to the Support Vector Machine (SVM) and Random Forest (RF). The results suggest that PCS gives more useful estimates of than the glasso. We set up a general theoretical framework and show that in a broad context, PCS fully recovers the support of and HCT-PCS yields optimal classification behavior. Our proofs shed interesting light on the behavior of stage-wise procedures.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…