Classifying token frequencies using angular Minkowski $p$-distance

Chris Cornelis

Classifying token frequencies using angular Minkowski p-distance

Abstract

Angular Minkowski p-distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski p-distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski p-distance may potentially be an even better choice for certain tasks. In a case study based on the 20-newsgroups dataset, we evaluate clasification performance for classical weighted nearest neighbours, as well as fuzzy rough nearest neighbours. In addition, we analyse the relationship between the hyperparameter p, the dimensionality m of the dataset, the number of neighbours k, the choice of weights and the choice of classifier. We conclude that it is possible to obtain substantially higher classification performance with angular Minkowski p-distance with suitable values for p than with classical cosine dissimilarity.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…