Perbandingan Performa Metode Berbasis Support Vector Machine untuk Penanganan Klasifikasi Multi Kelas Tidak Seimbang

Qorry Meidianingsih; Devi Eka Wardani; Ellis Salsabila; Lina Nafisah; Afifah Nur Mutia

doi:10.29313/statistika.v23i1.1660

Authors

Qorry Meidianingsih Universitas Negeri Jakarta
Devi Eka Wardani Universitas Negeri Jakarta
Ellis Salsabila Universitas Negeri Jakarta
Lina Nafisah Program Studi Statistika, Fakultas MIPA, Universitas Negeri Jakarta
Afifah Nur Mutia Program Studi Statistika, Fakultas MIPA, Universitas Negeri Jakarta

DOI:

https://doi.org/10.29313/statistika.v23i1.1660

Keywords:

support vector machine, imbalanced multi-class, smote, granular support vector machines–repetitive undersampling, confusion matrix

Abstract

ABSTRAK

Permasalahan data multi kelas tidak seimbang mulai mendapatkan perhatian dari komunitas peneliti dalam beberapa tahun terakhir. Permasalahan klasifikasi pada kasus multi kelas tidak seimbang menjadi lebih rumit karena sebagian besar teknik klasifikasi multi kelas diterapkan pada kondisi kelas yang seimbang, sedangkan dalam realisasinya data yang ditemukan lebih sering memiliki kelas tidak seimbang. Penelitian ini fokus pada membandingkan performa tiga metode klasifikasi berbasis support vector machine, yaitu SVM standar, SVM-SMOTE, dan granular support vector machines–repetitive undersampling (GSVM-RU) dimana metode dekomposisi one-versus-one (OVO) diterapkan. Terdapat tiga jenis data hasil bangkitan software R yang dirancang berdasarkan kombinasi jumlah kelas mayoritas dan minoritas yang mungkin terjadi. Hasil penelitian menunjukkan bahwa ketiga model klasifikasi menunjukkan tingkat akurasi tertinggi pada data simulasi yang memiliki perbandingan persentase antara jumlah amatan kelas mayoritas dan minoritasnya paling tinggi. Berdasarkan kriteria sensitivitas dan spesifisitas, model klasifikasi SVM standar dan SVM-SMOTE memberikan performa yang sama baiknya pada kelas mayoritas, sedangkan model klasifikasi GSVM-RU memiliki performa yang baik dalam mendeteksi kelas minoritas.

ABSTRACT

The problem of data with imbalances in multi-class has begun to receive attention from the research community in recent years. Classification problems in imbalanced multi-class cases become more complicated because most of the classification techniques in multi-class are applied to balanced class conditions, whereas in reality, the data found more often have unbalanced classes. This study focuses on comparing the performance of three support vector machine-based classification methods, namely standard SVM, SVM-SMOTE, and granular support vector machines–repetitive undersampling (GSVM-RU) where the one-versus-one (OVO) decomposition method is applied. There are three types of data generated by R software that are designed based on a combination of the number of possible majority and minority classes. The results showed that the three classification models showed the highest level of accuracy in the simulation data which had the highest percentage comparison between the number of observations of the majority and minority classes. Based on the sensitivity and specificity criteria, the standard SVM and SVM-SMOTE classification models provide equally good performance in the majority class, while the GSVM-RU classification model has good performance in detecting the minority class.

References

Azis, A. I. S., Suhartono, V., & Himawan, H. (2017). Model Multi-class SVM Menggunakan Strategi 1V1 untuk Klasifikasi Wall-Following Robot Naviagtion Data. Jurnal Teknologi Informasi, 13(2).

Boser, B. E., Guyon, I.M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Proceedings of Fifth Annual Workshop on Computational Learning Theory, 144-152. https://doi.org/10.1145/130385.130401.

Chawla, N. V., Bowyer. K. W., Hall, L.O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953.

Esteves, V. M. S. (2020). Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. Faculdade de Engenharia da Universidade do Porto.

Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. (2011). A review an ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems. 42:463-484.

Ghanem, A. S., Venkatesh, S., & West, G. (2010). Multi-class pattern classification in imbalanced data. Proceedings: 20th International Conference on Pattern Recognition, IEEE, Los Alamitos, Calif., 2881-2884. https://doi.org/10.1109/ICPR.2010.706.

Han, J. & Kamber, M. (2006). Data Mining: Concepts and Techniques, 2nd ed. San Fransisco: Morgan Kaufmann.

He H, Garcia EA. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering. 21:1263-1284.

Hu F, Li H. (2013). A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRS Boundary-SMOTE. Mathematical Problems in Engineering. 2013(694809): 1-10.

Lango, M. (2019). Tackling the Problem of Class Imbalance in Multi-class Sentiment Classification: An Experimental Study. Sciendo, 44(2). https://doi.org/10.2478/fcds-2019-0009.

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C. C., & Lin, C. C. (2020, February 1). Package ‘e1071’. https://CRAN.R-project.org/package=e1071.

Núñez, H., Gonzalez-Abril, L. & Angulo, C. (2017). Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias. Journal of Classification, 34(3):427-443. https://doi.org/10.1007/s00357-017-9242-x

Phung SL, Bouzerdoum A, Nguyen GH. (2009). Learning pattern classification tasks with imbalanced data sets. Di dalam: P. Yin, editor. Pattern Recognition, 193-208. Vukovar, Croatia: In-Teh.

Y. Tang, Y. -Q. Zhang, N. V. Chawla and S. Krasser. (2009). SVMs Modeling for Highly Imbalanced Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 39(1):281-288. doi: 10.1109/TSMCB.2008.2002909.