CRED-1: An Open Multi-Signal Domain Credibility Dataset for Automated Pre-Bunking of Online Misinformation
Abstract
This article presents CRED-1, an open, reproducible domain-level credibility dataset combining two openly-licensed source lists (OpenSources.co and Iffy.news) with four computed enrichment signals: domain age (WHOIS/RDAP), web popularity (Tranco Top-1M), fact-check frequency (Google Fact Check Tools API), and threat intelligence (Google Safe Browsing API). The dataset covers 2,672 domains categorized as fake, unreliable, mixed, conspiracy, or satire, each assigned a composite credibility score between 0.0 and 1.0. CRED-1 is designed for on-device deployment in privacy-preserving browser extensions to enable client-side pre-bunking of misinformation at the content delivery stage. The entire pipeline is implemented in Python using only standard library modules and is fully reproducible from publicly available sources. The dataset and pipeline code are released under CC~BY~4.0 and archived on Zenodo.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.