Transparent Semantic Spaces: A Categorical Approach to Explainable Word Embeddings
Abstract
The paper introduces a novel framework based on category theory to enhance the explainability of artificial intelligence systems, particularly focusing on word embeddings. Key topics include the construction of categories LT and PT, providing schematic representations of the semantics of a text T , and reframing the selection of the element with maximum probability as a categorical notion. Additionally, the monoidal category PT is constructed to visualize various methods of extracting semantic information from T, offering a dimension-agnostic definition of semantic spaces reliant solely on information within the text. Furthermore, the paper defines the categories of configurations Conf and word embeddings Emb, accompanied by the concept of divergence as a decoration on Emb. It establishes a mathematically precise method for comparing word embeddings, demonstrating the equivalence between the GloVe and Word2Vec algorithms and the metric MDS algorithm, transitioning from neural network algorithms (black box) to a transparent framework. Finally, the paper presents a mathematical approach to computing biases before embedding and offers insights on mitigating biases at the semantic space level, advancing the field of explainable artificial intelligence.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.