The German Tank Problem with Multiple Factories
Abstract
During the Second World War, estimates of the number of tanks deployed by Germany were critically needed. The Allies adopted a successful statistical approach to estimate this information: assume that the tanks are sequentially numbered starting from, say, 1, and ending at an unknown positive integer N. If we observe the numbers of k tanks, then the best linear unbiased estimator for N is M(1+1/k)-1 where M is the maximum observed serial number. While this approach was successful, there are many more adversarial situations where the approach for the original German Tank Problem falls short. Typically the number of ``factories'' is a possibly unknown l>1, and tanks produced by different factories may have serial numbers in disjoint ranges that are often separated by unknown amounts. Clark, Gonye and Miller (CGM) presented an unbiased estimator for N when the minimum serial number is unknown. So if one can identify which samples correspond to which factory, one can then estimate each factory's range using CGM's method, and sum them for an estimate of the rival's total productivity. We present a procedure to estimate the total productivity and prove that it is effective when l/ k is sufficiently small. In the final section, we show that if we have a small number of samples, we can make an estimator that performs orders of magnitude better when given additional information about the size of the gaps.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.