Maintaining Discrete Probability Distributions in Practice
Abstract
A classical problem in random number generation is the sampling of elements from a given discrete distribution. Formally, given a set of indices S = \1, …, n\ and sequence of weights w1, …, wn ∈ R+, the task is to provide samples from S with distribution p(i) = wi / W where W = Σj wj. A commonly accepted solution is Walker's Alias Table, which allows for each sample to be drawn in constant time. However, some applications correspond to a dynamic setting, where elements are inserted or removed, or weights change over time. Here, the Alias Table is not efficient, as it needs to be re-built whenever the underlying distribution changes. In this paper, we engineer a simple data structure for maintaining discrete probability distributions in the dynamic setting. Construction of the data structure is possible in time O(n), sampling is possible in expected time O(1), and an update of size can be processed in time O( n / W). As a special case, we maintain an urn containing W marbles of n colors where with each update O(W / n) marbles can be added or removed in O(1) time per update. To evaluate the efficiency of the data structure in practice we conduct an empirical study. The results suggest that the dynamic sampling performance is competitive with the static Alias Table. Compared to existing more complex dynamic solutions we obtain a sampling speed-up of up to half an order of magnitude.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.