Binomial and Multinomial Proportions: Accurate Estimation and Reliable Assessment of Accuracy
Abstract
Misestimates of σPo, the uncertainty in Po from a 2-state Bayes equation used for binary classification, apparently arose from σpi, the uncertainty in underlying pdfs estimated from experimental b-bin histograms. To address this, several Bayesian estimator pairs (pi, σpi) were compared for agreement between nominal confidence level () and calculated coverage values (C). Large -to-C inconsistency for large b and pi 1b arises for all multinomial estimators since priors downweight low likelihood, high pi values. To improve -to-C matching, (-C)2 was minimized against α0 in a more general prior pdf (B[α0,(b-1)α0;x]) to obtain (pi) C. This improved matching for b=2, but for b>2, -to-C matching by (pi) C required an effective value "b=2" and renormalization, and this reduced pi-to-pi matching. Better pi-to-pi matching came from the original multinomial estimators, a new discrete-domain estimator p(ni,N), or an earlier joint estimator, (pi) that co-adjusted all estimates pi for James-Stein shrinkage to a mean vector. Best simultaneous -to-C and pi-to-pi matching came by de-noising initial estimates of underlying pdfs. For b=100, N<12800, de-noised p needed ≈ 10× fewer observations to achieve pi-to-pi matching equivalent to that found for p(ni,N), (pi) or the original multinomial pi. De-noising each different type of initial estimate yielded similarly high accuracy in Monte-Carlo tests.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.