Adaptive Projected Two-Sample Comparisons for Single-Cell Gene Expression Data

Abstract

We study high-dimensional two-sample mean comparison and address the curse of dimensionality through data-adaptive projections. Leveraging the low-dimensional and localized signal structures commonly seen in single-cell genomics data, our first proposed method identifies a sparse, informative low-dimensional subspace and then performs statistical inference restricted to this subspace. To address the double-dipping issue -- arising from using the same data for projection and inference -- we develop a debiased projected estimator using the semiparametric double-machine learning framework. The resulting inference not only has the usual frequentist validity but also provides useful information on the potential location of the signal due to the sparsity of the projection. Our second method uses a more flexible projection scheme to improve the power against the global null hypothesis and avoid the degeneracy issue commonly faced by existing methods. It is particularly useful when debiasing is practically challenging or when the informative signal is not well-captured by the subspace. Experiments on synthetic data and real datasets demonstrate the theoretical promise and interpretability of the proposed methods.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…