Robust principal components for irregularly spaced longitudinal data

Abstract

Consider longitudinal data xij, with i=1,...,n and j=1,...,pi, where xij is the j-th observation of the random function Xi( .) observed at time tj. The goal of this paper is to develop a parsimonious representation of the data by a linear combination of a set of q smooth functions Hk( .) (k=1,..,q) in the sense that xij≈μj+Σk=1qβkiHk( tj) , such that it fulfills three goals: it is resistant to atypical Xi's ('case contamination'), it is resistant to isolated gross errors at some tij ('cell contamination'), and it can be applied when some of the xij are missing ('irregularly spaced' ---or 'incomplete'-- data). Two approaches will be proposed for this problem. One deals with the three goals stated above, and is based on ideas similar to MM-estimation (Yohai 1987). The other is a simple and fast estimator which can be applied to complete data with case- and cellwise contamination, and is based on applying a standard robust principal components estimate and smoothing the principal directions. Experiments with real and simulated data suggest that with complete data the simple estimator outperforms its competitors, while the MM estimator is competitive for incomplete data. Keywords: Principal components, MM-estimator, longitudinal .data, B-splines, incomplete data.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…