Effective Discovery of Meaningful Outlier Relationships

Abstract

We propose PODS (Predictable Outliers in Data-trendS), a method that, given a collection of temporal data sets, derives data-driven explanations for outliers by identifying meaningful relationships between them. First, we formalize the notion of meaningfulness, which so far has been informally framed in terms of explainability. Next, since outliers are rare and it is difficult to determine whether their relationships are meaningful, we develop a new criterion that does so by checking if these relationships could have been predicted from non-outliers, i.e., if we could see the outlier relationships coming. Finally, searching for meaningful outlier relationships between every pair of data sets in a large data collection is computationally infeasible. To address that, we propose an indexing strategy that prunes irrelevant comparisons across data sets, making the approach scalable. We present the results of an experimental evaluation using real data sets and different baselines, which demonstrates the effectiveness, robustness, and scalability of our approach.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…