SIGLE: a valid procedure for Selective Inference with the Generalized Linear Lasso

Abstract

This article investigates uncertainty quantification of the generalized linear lasso~(GLL), a popular variable selection method in high-dimensional regression settings. In many fields of study, researchers use data-driven methods to select a subset of variables that are most likely to be associated with a response variable. However, such variable selection methods can introduce bias and increase the likelihood of false positives, leading to incorrect conclusions. In this paper, we propose a post-selection inference framework that addresses these issues and allows for valid statistical inference after variable selection using GLL. We show that our method provides accurate p-values and confidence intervals, while maintaining high statistical power. In a second stage, we focus on the sparse logistic regression, a popular classifier in high-dimensional statistics. We show with extensive numerical simulations that SIGLE is more powerful than state-of-the-art PSI methods. SIGLE relies on a new method to sample states from the distribution of observations conditional on the selection event. This method is based on a simulated annealing strategy whose energy is given by the first order conditions of the logistic lasso.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…