High-dimensional regression with outcomes of mixed-type using the multivariate spike-and-slab LASSO

Abstract

We consider a high-dimensional multi-outcome regression in which q, possibly dependent, binary and continuous outcomes are regressed onto p covariates. We model the observed outcome vector as a partially observed latent realization from a multivariate linear regression model. Our goal is to estimate simultaneously a sparse matrix (B) of latent regression coefficients (i.e., partial covariate effects) and a sparse latent residual precision matrix (), which induces partial correlations between the observed outcomes. To this end, we specify continuous spike-and-slab priors on all entries of B and off-diagonal elements of and introduce a Monte Carlo Expectation-Conditional Maximization algorithm to compute the maximum a posterior estimate of the model parameters. Under a set of mild assumptions, we derive the posterior contraction rate for our model in the high-dimensional regimes where both p and q diverge with the sample size n and establish a sure screening property, which implies that, as n increases, we can recover all truly non-zero elements of B with probability tending to one. We demonstrate the excellent finite-sample properties of our proposed method, which we call mixed-mSSL, using extensive simulation studies and three applications spanning medicine to ecology.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…