Using Importance Sampling to Estimate p-values in All-Subset Meta-Analysis, with Applications to Single-Cell eQTL Mapping
Abstract
Pooling genome-wide association studies of multiple related traits can substantially increase power for detecting genetic variants with pleiotropic effects. ASSET, which exhaustively searches all subsets of studies for association signals, has been widely used to detect modest effects and improve interpretability. Under a normality assumption, ASSET computes p-values via an analytic approximation that accounts for multiple testing. However, this approximation has been evaluated only in limited scenarios and for p-values no smaller than 10-3. A systematic assessment in the extreme tail is therefore needed, yet na\"ive Monte Carlo methods would require prohibitively many simulations. We develop a computationally efficient importance-sampling (IS) algorithm that provides accurate ASSET p-value estimates for both independent and overlapping studies, achieving substantial efficiency gains over na\"ive Monte Carlo, particularly for very small p-values. Using IS, we show that ASSET's analytic approximation is highly accurate across nearly the entire p-value range when normality holds. In contrast, when normality is violated (due to small sample sizes, low-frequency variants, or non-normal traits), ASSET p-values can be inflated or deflated by orders of magnitude, whereas our IS approach remains accurate. We illustrate the method through applications to single-cell eQTL mapping using peripheral blood mononuclear cells from the OneK1K cohort and lung cells from a Korean population.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.