Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity

Bojie Li

Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity

Abstract

Closed-source frontier labs do not disclose parameter counts, and the standard alternative -- inference economics -- carries 2×+ uncertainty from hardware, batching, and serving-stack assumptions external to the model. We exploit a tighter intrinsic bound: storing F facts requires at least F/(bits per parameter) weights, so measuring how much a model knows lower-bounds how many parameters it has. We introduce Incompressible Knowledge Probes (IKPs), a benchmark of 1,400 factual questions spanning 7 tiers of obscurity, designed to isolate knowledge that cannot be derived by reasoning or compressed by architectural improvements. We calibrate a log-linear mapping from IKP accuracy to parameter count on 89 open-weight models (135M--1,600B) spanning 19 vendors, achieving R2 = 0.917; leave-one-out cross-validation confirms generalization (median fold error 1.59×, 68.5\% within 2× and 87.6\% within 3×). For Mixture-of-Experts models, total parameters predict knowledge (R2 = 0.79) far better than active parameters (R2 = 0.51). We evaluate 188 models from 27 vendors and estimate effective knowledge capacity for all major proprietary frontier models; for heavily safety-tuned models the estimates are lower bounds, since refusal policy can hide tens of percentage points of "refused but known" capacity. The widely-reported saturation of reasoning benchmarks does not imply the end of scaling. Procedural capability compresses under the "Densing Law," but across 96 dated open-weight models the IKP time coefficient is -0.0010/month (95\% CI [-0.0031, +0.0008]) -- indistinguishable from zero, and rejecting the Densing prediction of +0.0117/month at p < 10-15. Factual capacity continues to scale log-linearly with parameters across generations and across vendors.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…