Importance Sampling for Event Discovery via Guesswork

Abstract

Traditional importance sampling (IS) is designed to estimate rare-event probabilities by minimizing estimator variance. However, many applications prioritize rapid discovery: the generation of a trajectory within a rare set An. This requires a shift from ensemble-based estimation to a design principle focused on the hitting time τAn := ∈f\t 1 : Ytn ∈ An\. We formalize a Quality of Discovery problem as the problem of minimizing the description length (surprisal) of the discovered trajectory under the nominal model p. We prove that minimizing this description length is equivalent to minimizing the nominal rank exponent Jrank(qn) := n∞ 1n Gn(Yn), where Gn(xn) is the guesswork of sequence xn. For i.i.d.\ models and type-defined rare sets Γ, we show that while classical IS targets the mass-dominating type QIS* ∈ Q ∈ Γ D(Q\|p), discovery optimality is achieved by QGW* ∈ Q ∈ Γ [H(Q) + D(Q\|p)]. This framework identifies a fundamental rule: minimizing the guesswork exponent ensures the discovered sequence is the "least surprising" representative of the set relative to the nominal model's search order. We further demonstrate that under budgetary constraints, this exponent serves as a lexicographic tie-breaker when the hitting-time minimizer is not unique. This establishes H(Q) + D(Q\|p) as a natural objective for discovery-based importance sampling, providing a formal bridge between randomized sampling and systematic search.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…