A Multimodal Machine Learning Framework for Enterprise Database Workload-Aware Root Cause Analysis
Abstract
Root cause analysis for enterprise database incidents is often a manual and time consuming process that requires operators to inspect logs, performance metrics, and workload behavior. Existing approaches commonly focus on a single source of evidence, which limits their ability to capture the broader operational context behind incidents such as CPU saturation, I/O bottlenecks, lock contention, deadlocks, and slow query execution. This paper presents a multimodal machine learning framework for workload-aware root cause analysis in enterprise database environments. The proposed approach combines workload characteristics, system telemetry, and operational signals from compute, storage, and accelerator oriented datasets. Engineered workload aware features are used to classify workload behavior and support downstream diagnosis of likely incident causes. The framework evaluates Random Forest, LightGBM, and feedforward neural network models for workload classification and root cause analysis support. Experimental results show that workload aware feature engineering improves workload separability, with LightGBM providing the strongest balance of predictive performance and interpretability. The results suggest that combining multimodal telemetry with workload context can provide a practical foundation for automated and explainable root cause analysis systems.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.