CityPulse: Real-Time Traffic Data Analytics and Congestion Prediction
Abstract
CityPulse is a proof-of-concept big data pipeline designed to enable real-time urban mobility analytics using scalable, containerized components -- without reliance on physical sensor infrastructure. The system simulates the ingestion of 11 million traffic-related records representing urban phenomena such as vehicle congestion, GPS coordinates, and weather conditions. Data is ingested through a Dockerized Apache Kafka cluster, coordinated by ZooKeeper, and processed in real time using Apache Spark Structured Streaming. To ensure robustness under load, the architecture introduces a temporary data storage layer that buffers Spark output before committing it to a centralized data warehouse. This design improves write efficiency, fault tolerance, and enables batch processing of intermediate results. The refined data feeds into a lightweight machine learning module and is served through a Flask backend with a React-based frontend for visualization and interaction. Stress testing shows that the system maintains over 300,000 records per minute throughput with only a 10\% increase in latency under full load conditions. With its modular Docker-based deployment, CityPulse offers a cost-effective and reproducible analytics solution for traffic congestion monitoring in resource-constrained environments, particularly in developing regions like Cameroon.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.