Optimal Estimation of Simultaneous Signals Using Absolute Inner Product with Applications to Integrative Genomics

Abstract

Integrating the summary statistics from genome-wide association study (gwas) and expression quantitative trait loci (eqtl) data provides a powerful way of identifying the genes whose expression levels are potentially associated with complex diseases. A parameter called T-score that quantifies the genetic overlap between a gene and the disease phenotype based on the summary statistics is introduced based on the mean values of two Gaussian sequences. Specifically, given two independent samples xn N(θ, 1) and yn N(μ, 2), the T-score is defined as Σi=1n |θiμi|, a non-smooth functional, which characterizes the amount of shared signals between two absolute normal mean vectors |θ| and |μ|. Using approximation theory, estimators are constructed and shown to be minimax rate-optimal and adaptive over various parameter spaces. Simulation studies demonstrate the superiority of the proposed estimators over existing methods. The method is applied to an integrative analysis of heart failure genomics datasets and we identify several genes and biological pathways that are potentially causal to human heart failure.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…