A Mathematical Optimization Approach for Expert-Informed Bayesian Best Subset Selection
Abstract
A central challenge in statistical modeling is identifying the subset of features that belong in the true regression model. The classical best subset selection problem, recently made tractable via mixed-integer optimization (MIO), finds the globally optimal sparse solution. It does not, however, make use of any information beyond the observed data. In many applied settings, domain experts can meaningfully rank or score the relevance of candidate predictors, yet no existing framework integrates such probabilistic expert assessments directly into the best-subsets objective. This paper presents Expert-Implied Bayesian Best Subsets (EBBS), a method that incorporates domain-expert probability estimates of feature relevance into the MIO best-subsets problem through a maximum a posteriori (MAP) framework. Expert views from multiple respondents are aggregated into a single prior probability per feature using the Poisson binomial distribution for marginal probability estimates, the pairwise win rate for pairwise comparisons, or the normalized mean rank for ordinal rankings. This probability enters the objective function as a log-odds penalty term that smoothly encourages or discourages the selection of each feature consistent with the expert consensus. This paper provides analytic derivations of the MAP formulation and characterizes its theoretical properties. The proposed model reduces to Best Subsets when experts all have no views. Empirical results on synthetic and real datasets are forthcoming.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.