c-lasso -- a Python package for constrained sparse and robust regression and classification
Abstract
We introduce c-lasso, a Python package that enables sparse and robust linear regression and classification with linear equality constraints. The underlying statistical forward model is assumed to be of the following form: \[ y = X β + σ ε subject to Cβ=0 \] Here, X ∈ Rn× dis a given design matrix and the vector y ∈ Rn is a continuous or binary response vector. The matrix C is a general constraint matrix. The vector β ∈ Rd contains the unknown coefficients and σ an unknown scale. Prominent use cases are (sparse) log-contrast regression with compositional data X, requiring the constraint 1dT β = 0 (Aitchion and Bacon-Shone 1984) and the Generalized Lasso which is a special case of the described problem (see, e.g, (James, Paulson, and Rusmevichientong 2020), Example 3). The c-lasso package provides estimators for inferring unknown coefficients and scale (i.e., perspective M-estimators (Combettes and M\"uller 2020a)) of the form \[ β ∈ Rd, σ ∈ R0 f(Xβ - y,σ ) + λ β1 subject to Cβ = 0 \] for several convex loss functions f(·,·). This includes the constrained Lasso, the constrained scaled Lasso, and sparse Huber M-estimators with linear equality constraints.