Bernoulli amputation
Abstract
A novel, stochastic approach to amputation, the process of introducing missing values to a complete dataset, is presented. It allows one to construct a wide variety of missingness patterns by only having to specify distributions of missingness indicators as opposed to specifying each missingness pattern manually. Missingness indicators are modeled in a principled way via copulas and Bernoulli margins, thus allowing one to incorporate dependence in missingness patterns. Besides more classical missingness mechanisms such as missing completely at random, missing at random, and missing not at random, the approach is able to model structured missingness such as block missingness and, via mixtures, monotone missingness, which are patterns of missing data frequently found in real-life datasets. Properties such as joint missingness probabilities or missingness correlation are derived mathematically. The flexibility of the approach in capturing different missingness patterns while only requiring to specify distributional assumptions on missingness indicators is demonstrated with mathematical examples and empirical illustrations in terms of a well-known example dataset of sufficiently small sample size that allows to identify each missing data point visually. Finally, an example application to multivariate financial time series is provided.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.