Optimal worst-risk minimization in structural equation models with random coefficients
Abstract
The insight that causal parameters are particularly suitable for out-of-sample prediction has sparked a lot development of causal-like predictors. However, the connection with strict causal targets, has limited the development with good risk minimization properties, but without a direct causal interpretation. In this manuscript we derive the optimal out-of-sample risk minimizing predictor of a certain target Y in a non-linear system (X,Y) that has been trained in several within-sample environments. We consider data from an observation environment, and several shifted environments. Each environment corresponds to a structural equation model (SEM), with random coefficients and with its own shift and noise vector, both in L2. Unlike previous approaches, we also allow shifts in the target value. We define a sieve of out-of-sample environments, consisting of all shifts A that are at most γ times as strong as any weighted average of the observed shift vectors. For each β∈Rp we show that the supremum of the risk functions RA(β) has a worst-risk decomposition into a (positive) non-linear combination of risk functions, depending on γ. We then define the set Bγ, as minimizers of this risk. The main result of the paper is that there is a unique minimizer (|Bγ|=1) that can be consistently estimated by an explicit estimator, outside a set of zero Lebesgue measure in the parameter space. A practical obstacle for the initial method of estimation is that it involves the solution of a general degree polynomials. Therefore, we prove that an approximate estimator using the bisection method is also consistent.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.