Using machine learning to build public policy agenda from social media conversations

Abstract

Issue identification and agenda setting represents an important stage in the public policy making process. Traditional approaches for carrying out activities under this stage are time- and labor-intensive on data collection and analysis, in addition to being costly to scale over large geographic areas. In this work we propose a human-augmented machine learning (ML) approach for identifying matters of public interest from social media conversations. The approach consists of five stages namely, input data cleaning and preprocessing, keywords extraction and issue identification, narrative creation, narrative validation, and agenda validation. We implemented experiments to validate the output of our method based on a Twitter dataset and using Latent Dirichlet Allocation (LDA) and Top2Vec for topic modeling. Natural Language Generation (NLG) was achieved using GPT-2 while narrative and agenda validation were based on similarity analysis and human evaluation. We achieved "very good" and "good" inter-rater agreement (IRA) on readability and coherence of agenda narrative generation by our GPT-2 model. On the other hand, IRA was "good" for generated agenda items. We also achieved above average cosine similarity score on at least three out of five reference text (narrative) themes. These results demonstrate that the ML approach represents a promising methodology for identifying issues of public interest from social media conversations.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…