Puzzle Game: Prediction and Classification of Wordle Solution Words
Abstract
We study the prediction and classification of Wordle solution words. After cleaning the public results log, we fit an ARIMA model to forecast the daily volume of reported outcomes through March 1, 2023. For each solution word, we compute three interpretable attributes: usage frequency (FREQ), word information entropy (WIE), and the number of repeated letters (NRE), and analyze their correlations with the empirical attempt distribution (1-6 attempts plus failure, coded as 7). We then train an XGBoost regressor to predict the full 1-7 outcome distribution for unseen words; a case study of "EERIE" illustrates the model's behavior. To categorize difficulty, we cluster words into three tiers (simple, moderate, difficult) via K-means and train a decision-tree classifier that maps FREQ, WIE, and NRE to these tiers, yielding interpretable rules. For each word, we also report the share of players requiring three or more attempts. Sensitivity analyses and full modeling details are provided in the appendix.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.