Composition as Direction: An Active-Set Ray-Based Model for Sparse High-Dimensional Compositional Data

Abstract

[Working Draft] Compositional data are central to microbial, ecological, and environmental research, yet often have four features that are difficult to accommodate jointly: exact zeros, latent dependence among components, high-dimensionality, and a unit-sum constraint that induces a non-Euclidean geometry. Conventional Dirichlet-type and logistic-normal models address these features only partially. Projected Gaussian models offer a directional representation that captures exact zeros and latent dependence; however, support correctness on the simplex requires either truncation or folding, both of which become computationally prohibitive as the dimension grows. We develop an Active-set Ray-based Compositional (ARC) framework, which retains the benefits of projected Gaussian models while remaining computationally feasible in high-dimensional settings. In this framework, we map compositions to the nonnegative orthant of the unit hypersphere and specify an active-set process that governs which components are present. Conditional on the active set, the positive subcomposition is modeled by evaluating a latent Gaussian density along positive rays of the active subspace with the radius treated as an auxiliary variable. Such a construction (i) separates the active-set process that governs which components are present from the positive subcomposition on the active components, (ii) preserves a latent Gaussian interpretation, and (iii) accommodates arbitrary latent dependence. Thus, the framework is conducive to high-dimensional applications in which exact zeros and shared positive responses are scientifically central. Conceptually, the proposed framework reframes a composition as an observed direction of a latent abundance vector with an unobserved magnitude and an explicitly modeled active set.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…