Imbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems.
In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.
Authors: Piyaphol Phoungphol | Yanqing Zhang | Yichuan Zhao