A Classification Algorithm to Recognize Fake News Websites
Abstract
'Fake news' is information that generally spreads on the web, which only mimics the form of reliable news media content. The phenomenon has assumed uncontrolled proportions in recent years rising the concern of authorities and citizens. In this paper we present a classifier able to distinguish a reliable source from a fake news website. We have prepared a dataset made of 200 fake news websites and 200 reliable websites from all over the world and used as predictors information potentially available on websites, such as the presence of a 'contact us' section or a secured connection. The algorithm is based on logistic regression, whereas further analyses were carried out using tetrachoric correlation coefficients for dichotomous variables and chi-square tests. This framework offers a concrete solution to attribute a 'reliability score' to news website, defined as the probability that a source is reliable or not, and on this probability a user can decide if the news is worth sharing or not.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.