BuyTheBy: A dataset of 18,710 text-based paper mill advertisements with 51,812 timestamped prices
Abstract
The study of paper mills and similar businesses operating in the market for academic and education fraud services is frustrated by the lack of market price data on their various offerings. Here, we assemble BuyTheBy, a large, annotated dataset of timestamped, text-based paper mill advertisements from seven businesses operating out of seven different countries. The dataset consists of 18,710 individual advertisements, of which 15,839 have prices listed. Among these there are 20,598 positions listed as for sale on 5,567 unique products in 14 different product categories with 51,812 timestamped price data points. We perform elementary analysis of this dataset to demonstrate its utility for quantitative understanding of markets for academic fraud services and suggest future use cases.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.