Bayes Imbalance Impact Index: A Measure of Class Imbalanced Dataset for Classification Problem

Abstract

Recent studies have shown that imbalance ratio is not the only cause of the performance loss of a classifier in imbalanced data classification. In fact, other data factors, such as small disjuncts, noises and overlapping, also play the roles in tandem with imbalance ratio, which makes the problem difficult. Thus far, the empirical studies have demonstrated the relationship between the imbalance ratio and other data factors only. To the best of our knowledge, there is no any measurement about the extent of influence of class imbalance on the classification performance of imbalanced data. Further, it is also unknown for a dataset which data factor is actually the main barrier for classification. In this paper, we focus on Bayes optimal classifier and study the influence of class imbalance from a theoretical perspective. Accordingly, we propose an instance measure called Individual Bayes Imbalance Impact Index (IBI3) and a data measure called Bayes Imbalance Impact Index (BI3). IBI3 and BI3 reflect the extent of influence purely by the factor of imbalance in terms of each minority class sample and the whole dataset, respectively. Therefore, IBI3 can be used as an instance complexity measure of imbalance and BI3 is a criterion to show the degree of how imbalance deteriorates the classification. As a result, we can therefore use BI3 to judge whether it is worth using imbalance recovery methods like sampling or cost-sensitive methods to recover the performance loss of a classifier. The experiments show that IBI3 is highly consistent with the increase of prediction score made by the imbalance recovery methods and BI3 is highly consistent with the improvement of F1 score made by the imbalance recovery methods on both synthetic and real benchmark datasets.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…