Linear Regression with Unknown Truncation Beyond Gaussian Features

Constantine Caramanis

Linear Regression with Unknown Truncation Beyond Gaussian Features

Abstract

In truncated linear regression, samples (x,y) are shown only when the outcome y falls inside a certain survival set S and the goal is to estimate the unknown d-dimensional regressor w. This problem has a long history of study in Statistics and Machine Learning going back to the works of (Galton, 1897; Tobin, 1958) and more recently in, e.g., (Daskalakis et al., 2019; 2021; Lee et al., 2023; 2024). Despite this long history, however, most prior works are limited to the special case where S is precisely known. The more practically relevant case, where S is unknown and must be learned from data, remains open: indeed, here the only available algorithms require strong assumptions on the distribution of the feature vectors (e.g., Gaussianity) and, even then, have a dpoly (1/) run time for achieving accuracy. In this work, we give the first algorithm for truncated linear regression with unknown survival set that runs in poly (d/) time, by only requiring that the feature vectors are sub-Gaussian. Our algorithm relies on a novel subroutine for efficiently learning unions of a bounded number of intervals using access to positive examples (without any negative examples) under a certain smoothness condition. This learning guarantee adds to the line of works on positive-only PAC learning and may be of independent interest.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…