Semi-Confirmatory Factor Analysis for High-Dimensional Data with Interconnected Community Structures

Abstract

Confirmatory factor analysis (CFA) is a statistical method for identifying and confirming the presence of latent factors among observed variables through the analysis of their covariance structure. Compared to alternative factor models, CFA offers interpretable common factors with enhanced specificity and a more adaptable approach to covariance structure modeling. However, the application of CFA has been limited by the requirement for prior knowledge about "non-zero loadings" and by the lack of computational scalability (e.g., it can be computationally intractable for hundreds of observed variables). We propose a data-driven semi-confirmatory factor analysis (SCFA) model that attempts to alleviate these limitations. SCFA automatically specifies "non-zero loadings" by learning the network structure of the large covariance matrix of observed variables, and then offers closed-form estimators for factor loadings, factor scores, covariances between common factors, and variances between errors using the likelihood method. Therefore, SCFA is applicable to high-throughput datasets (e.g., hundreds of thousands of observed variables) without requiring prior knowledge about "non-zero loadings". Through an extensive simulation analysis benchmarking against standard packages, SCFA exhibits superior performance in estimating model parameters with a much-reduced computational time. We illustrate its practical application through factor analysis on two high-dimensional RNA-seq gene expression datasets.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…