Variable selection in functional data classification: a maxima-hunting proposal

Abstract

Variable selection is considered in the setting of supervised binary classification with functional data \X(t),\ t∈[0,1]\. By "variable selection" we mean any dimension-reduction method which leads to replace the whole trajectory \X(t),\ t∈[0,1]\, with a low-dimensional vector (X(t1),…,X(tk)) still keeping a similar classification error. Our proposal for variable selection is based on the idea of selecting the local maxima (t1,…,tk) of the function VX2(t)= V2(X(t),Y), where V denotes the "distance covariance" association measure for random variables due to Sz\'ekely, Rizzo and Bakirov (2007). This method provides a simple natural way to deal with the relevance vs. redundancy trade-off which typically appears in variable selection. This paper includes (a) Some theoretical motivation: a result of consistent estimation on the maxima of VX2 is shown. We also show different theoretical models for the underlying process X(t) under which the relevant information in concentrated in the maxima of VX2. (b) An extensive empirical study, including about 400 simulated models and real data examples, aimed at comparing our variable selection method with other standard proposals for dimension reduction.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…