Missing Mass Concentration for Markov Chains

Abstract

The problem of missing mass in statistical inference (posed by McAllester and Ortiz, NIPS'02; most recently revisited by Changa and Thangaraj, ISIT'2019) seeks to estimate the weight of symbols that have not been sampled yet from a source. So far all the approaches have been focused on the IID model which, although overly simplistic, is already not straightforward to tackle. The non-trivial part is in handling correlated events and sums of variables with very different scales where classical concentration inequalities do not yield good bounds. In this paper we develop the research on missing mass further, solving the problem for Markov chains. We reduce the problem to studying the tails of hitting times and finding log-additive approximations to them. More precisely, we combine the technique of majorization and certain estimates on set hitting times to show how the problem can be eventually reduced back to the IID case. Our contribution are a) new technique to obtain missing mass bounds - we replace traditionally used negative association by majorization which works for a wider class of processes b) first (exponential) concentration bounds for missing mass in Markov chain models c) simplifications of recent results on set hitting times and d) simplified derivation of missing mass estimates for memory-less sources.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…