online read us now
Paper details
Number 2 - June 2024
Volume 34 - 2024
A Gaussian-based WGAN-GP oversampling approach for solving the class imbalance problem
Qian Zhou, Bo Sun
Abstract
In practical applications of machine learning, the class distribution of the collected training set is usually imbalanced, i.e.,
there is a large difference among the sizes of different classes. The class imbalance problem often hinders the achievable
generalization performance of most classifier learning algorithms to a large extent. To ameliorate the learning performance,
some effective approaches have been proposed in the literature, where the recently presented GAN-based oversampling
methods are very representative. However, their generated minority class examples have the risk of high similarity and
duplication degree. To further ameliorate the quality of the generated minority class examples, i.e., to make the generated
examples effectively expand the minority class region, a novel oversampling approach named the GWGAN-GP is proposed,
which is based on the Gaussian distribution label within the framework of a Wasserstein generative adversarial network with
gradient penalty (WGAN-GP). Our GWGAN-GP approach incorporates the Gaussian distribution as an input label, thereby
making the generated examples more diverse and dispersive. The examples are then combined with the original dataset
to form a balanced dataset, which is subsequently utilized to evaluate the classification performance of three selected
classification algorithms. Experimental results on 16 imbalanced datasets demonstrate that the GWGAN-GP not only
generates examples that better conform to the distribution of the original dataset, but also achieves superior classification
performance. Specifically, when combined with the KNN classifier, the GWGAN-GP significantly outperforms other
oversampling approaches considered in the study.
Keywords
machine learning, class imbalance, generative adversarial networks, oversampling, data duplication