Persistence Norms and the Datasaurus

Abstract

Topological Data Analysis (TDA) provides a toolkit for the study of the shape of high dimensional and complex data. While operating on a space of persistence diagrams is cumbersome, persistence norms provide a simple real value measure of multivariate data which is seeing greater adoption within finance. A growing literature seeks links between persistence norms and the summary statistics of the data being analysed. This short note targets the demonstration of differences in the persistence norms of the Datasaurus datasets of Matejka and Fitzmaurice. We show that persistence norms can be used as additional measures that often discriminate datasets with the same collection of summary statistics. Treating each of the data sets as a point cloud we construct the L1 and L2 persistence norms in dimensions 0 and 1. We show multivariate distributions with identical covariance and correlation matrices can have considerably different persistence norms. Through the example, we remind users of persistence norms of the importance of checking the distribution of the point clouds from which the norms are constructed.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…