Interpolating between the Jaccard distance and an analogue of the normalized information distance
Abstract
Jim\'enez, Becerra, and Gelbukh (2013) defined a family of "symmetric Tversky ratio models" Sα,β, 0α 1, β>0. Each function Dα,β=1-Sα,β is a semimetric on the powerset of a given finite set. We show that Dα,β is a metric if and only if 0α 12 and β 1/(1-α). This result is formally verified in the Lean proof assistant. The extreme points of this parametrized space of metrics are V1=D1/2,2, the Jaccard distance, and V∞=D0,1, an analogue of the normalized information distance of M. Li, Chen, X. Li, Ma, and Vit\'anyi (2004). As a second interpolation, in general we also show that Vp is a metric, 1 p∞, where p(A,B)=(|B A|p+|A B|p)1/p, Vp(A,B)=p(A,B)|A B| + p(A,B).
0