Scalable mRMR feature selection to handle high dimensional datasets: Vertical partitioning based Iterative MapReduce framework
Abstract
While building machine learning models, Feature selection (FS) stands out as an essential preprocessing step used to handle the uncertainty and vagueness in the data. Recently, the minimum Redundancy and Maximum Relevance (mRMR) approach has proven to be effective in obtaining the irredundant feature subset. Owing to the generation of voluminous datasets, it is essential to design scalable solutions using distributed/parallel paradigms. MapReduce solutions are proven to be one of the best approaches to designing fault-tolerant and scalable solutions. This work analyses the existing MapReduce approaches for mRMR feature selection and identifies the limitations thereof. In the current study, we proposed VMRmRMR, an efficient vertical partitioning-based approach using a memorization approach, thereby overcoming the extant approaches limitations. The experiment analysis says that VMRmRMR significantly outperformed extant approaches and achieved a better computational gain (C.G). In addition, we also conducted a comparative analysis with the horizontal partitioning approach HMRmRMR [1] to assess the strengths and limitations of the proposed approach.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.