The Shannon Entropy of a Histogram

Abstract

The histogram is a key method for visualizing data and estimating the underlying probability distribution. Incorrect conclusions about the data result from over or under-binning. A new method based on the Shannon entropy of the histogram uses a simple formula based on the differential entropy estimated from nearest-neighbour distances. Links are made between the new method and other algorithms such as Scott's formula, and cost and risk function methods. A parameter is found that predicts over and under-binning, which can be estimated for any histogram. The new algorithm is shown to be robust by application to real data.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…