Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process
Abstract
In regression with random design, we study the problem of selecting a model that performs well for out-of-sample prediction. We do not assume that any of the candidate models under consideration are correct. Our analysis is based on explicit finite-sample results. Our main findings differ from those of other analyses that are based on traditional large-sample limit approximations because we consider a situation where the sample size is small relative to the complexity of the data-generating process, in the sense that the number of parameters in a `good' model is of the same order as sample size. Also, we allow for the case where the number of candidate models is (much) larger than sample size.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.