MongoDB Injection Query Classification Model using MongoDB Log files as Training Data
Abstract
NoSQL Injection attacks are a class of cybersecurity attacks where an attacker sends a specifically engineered query to a NoSQL database which then performs an unauthorized operation. To defend against such attacks, rule based systems were initially developed but then were found to be ineffective to innovative injection attacks hence a model based approach was developed. Most model based detection systems, during testing gave exponentially positive results but were trained only on the query statement sent to the server. However due to the scarcity of data and class imbalances these model based systems were found to be not effective against all attacks in the real world. This paper explores classifying NoSQL injection attacks sent to a MongoDB server based on Log Data, and other extracted features excluding raw query statements. The log data was collected from a simulated attack on an empty MongoDB server which was then processed and explored. A discriminant analysis was carried out to determine statistically significant features to discriminate between injection and benign queries resulting in a dataset of significant features. Several Machine learning based classification models using an AutoML library, "FLAML", as well as 6 manually programmed models were trained on this dataset , which were then trained on 50 randomized samples of data, cross validated and evaluated. The study found that the best model was the "FLAML" library's "XGBoost limited depth" model with an accuracy of 71%.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.