Online Density-Based Clustering for Real-Time Narrative Evolution Monitorin

Abstract

Automated narrative intelligence systems for social media monitoring face significant scalability challenges when relying on batch clustering methods to process continuous data streams. We investigate replacing offline HDBSCAN with online density-based clustering algorithms in a production narrative report generation pipeline that processes large volumes of multilingual social media data. While HDBSCAN effectively discovers hierarchical clusters and handles noise, its batch-only nature requires full retraining for each time window, limiting scalability and real-time adaptability. We evaluate online clustering methods with respect to cluster quality, computational efficiency, memory footprint, and integration with downstream narrative extraction. Our evaluation combines standard clustering metrics, narrative-specific measures, and human validation of cluster correctness to assess both structural quality and semantic interpretability. Experiments using sliding-window simulations on historical data from the Ukrainian information space reveal trade-offs between temporal stability and narrative coherence, with DenStream achieving the strongest overall performance. These findings bridge the gap between batch-oriented clustering approaches and the streaming requirements of large-scale narrative monitoring systems.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…