Perbandingan Algoritma Machine Learning Menggunakan Pemilihan Fitur Chi-square dalam Pengklasifikasian Penyakit Jantung

Authors

  • Hirmayanti Hirmayanti Magister Teknik Informatika, Universitas Amikom Yogyakarta
  • Ema Utami Magister Teknik Informatika, Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.33020/saintekom.v15i1.815

Keywords:

heart disease, cardiovascular, feature selection, chi-square, hyperparameter

Abstract

Heart disease is one of the deadliest diseases worldwide. This condition often presents symptoms that do not immediately cause severe effects on the sufferer, making early anticipation crucial. To reduce fatalities caused by heart disease or cardiovascular disorders, a system is required to identify its primary causes so that these factors can be minimized. Therefore, this study applies the Chi-square feature selection method to determine the key features influencing the accuracy of Machine Learning models. A comparison is conducted between K-Nearest Neighbor, Naïve Bayes, Logistic Regression, Support Vector Machine, and Random Forest algorithms. This comparison aims to obtain the most accurate results, as a higher algorithm accuracy leads to a more precise classification system for heart disease. The study’s findings indicate that eight key features selected using the Chi-square method yield the highest accuracy, specifically 93.51% with the KNN algorithm. These results demonstrate that using relevant features improves classification accuracy and system efficiency compared to utilizing all available features. Consequently, this research contributes to the selection of essential features in Machine Learning algorithms through the Chi-square technique, ensuring a more effective and optimized heart disease classification system.

Downloads

Download data is not yet available.

References

Adiatma, B. C. L., Utami, E., & Hartanto, A. D. (2021). Pengenalan Ekspresi Wajah Menggunakan Deep Convolutional Neural Network. EXPLORE, 11(2), 75. https://doi.org/10.35200/explore.v11i2.478

Albert, A. J., Murugan, R., & Sripriya, T. (2023). Diagnosis of heart disease using oversampling methods and decision tree classifier in cardiology. Research on Biomedical Engineering, 39(1), 99–113. https://doi.org/10.1007/s42600-022-00253-9

Bhatt, C. M., Patel, P., Ghetia, T., & Mazzeo, P. L. (2023). Effective Heart-Disease Prediction by Using Hybrid Machine Learning Technique. MDPI, 1670–1675. https://doi.org/10.1109/ICCPCT58313.2023.10245785

Biswas, N., Ali, M. M., Rahaman, M. A., Islam, M., Mia, M. R., Azam, S., Ahmed, K., Bui, F. M., Al-Zahrani, F. A., & Moni, M. A. (2023). Machine Learning-Based Model to Predict Heart Disease in Early Stage Employing Different Feature Selection Techniques. Hindawi BioMed Research International, 2023. https://doi.org/10.1155/2023/6864343

Bujang, S. D. A., Selamat, A., Ibrahim, R., Krejcar, O., Herrera-Viedma, E., Fujita, H., & Ghani, N. A. M. (2021). Multiclass Prediction Model for Student Grade Prediction Using Machine Learning. IEEE Access, 9, 95608–95621. https://doi.org/10.1109/ACCESS.2021.3093563

Chandrasekhar, N., & Peddakrishna, S. (2023). Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization. MDPI, 11(4). https://doi.org/10.3390/pr11041210

Claesen, M., & De Moor, B. (2015). Hyperparameter Search in Machine Learning. ArXiv, 10–14. http://arxiv.org/abs/1502.02127

Escalante, H. J. (2005). A comparison of outlier detection algorithms for machine learning. Programming and Computer Software.

Estetikha, A. K. A., Gutama, D. H., Pradana, M. G., & Wijaya, D. P. (2021). Comparison of K-Means Clustering & Logistic Regression on University data to differentiate between Public and Private University. IJIIS: International Journal of Informatics and Information Systems, 4(1), 21–29. https://doi.org/10.47738/ijiis.v4i1.74

G, A., Ganesh, B., Ganesh, A., Srinivas, C., Dhanraj, & Mensinkal, K. (2022). Logistic regression technique for prediction of cardiovascular disease. Global Transitions Proceedings, 3(1), 127–130. https://doi.org/10.1016/j.gltp.2022.04.008

Ghosh, P., Azam, S., Jonkman, M., Karim, A., Shamrat, F. M. J. M., Ignatious, E., Shultana, S., Beeravolu, A. R., & De Boer, F. (2021). Efficient prediction of cardiovascular disease using machine learning algorithms with relief and lasso feature selection techniques. IEEE Access, 9, 19304–19326. https://doi.org/10.1109/ACCESS.2021.3053759

Jusia, P. A., Rahim, A., Yani, H., & Jasmir, J. (2024). Improving Performance of KNN and C4.5 using Particle Swarm Optimization in Classification of Heart Diseases. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 8(3), 333–339. https://doi.org/10.29207/resti.v8i3.5710

Khairi, A., Ghozali, A. F., & Hidayah, A. D. N. (2021). Implementasi K-Nearest Neighbor (KNN) untuk Mengklasifikasi Masyarakat Pra-Sejahtera Desa Sapikerep Kecamatan Sukapura. TRILOGI: Jurnal Ilmu Teknologi, Kesehatan, Dan Humaniora, 2(3), 319–323. https://doi.org/10.33650/trilogi.v2i3.2878

Khan, A., Qureshi, M., Daniyal, M., & Tawiah, K. (2023). A Novel Study on Machine Learning Algorithm-Based Cardiovascular Disease Prediction. Health & Social Care in the Community, 2023(Cvd), 1–10. https://doi.org/10.1155/2023/1406060

Khurana, P., Sharma, S., & Goyal, A. (2021). Heart Disease Diagnosis: Performance Evaluation of Supervised Machine Learning and Feature Selection Techniques. Proceedings of the 8th International Conference on Signal Processing and Integrated Networks, SPIN 2021, August, 510–515. https://doi.org/10.1109/SPIN52536.2021.9565963

Kibria, H. B., & Matin, A. (2022). The severity prediction of the binary and multi-class cardiovascular disease ? A machine learning-based fusion approach. ArXiv, 98, 107672. https://doi.org/10.1016/j.compbiolchem.2022.107672

Li, J. P., Haq, A. U., Din, S. U., Khan, J., Khan, A., & Saboor, A. (2020). Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare. IEEE Access, 8(Ml), 107562–107582. https://doi.org/10.1109/ACCESS.2020.3001149

Narayanan, & Jayashree. (2024). Implementation of Efficient Machine Learning Techniques for Prediction of Cardiac Disease using SMOTE. Elsevier, 233(2023), 558–569. https://doi.org/10.1016/j.procs.2024.03.245

Ogundepo, E. A., & Yahya, W. B. (2023). Performance analysis of supervised classification models on heart disease prediction. Innovations in Systems and Software Engineering, 19(1), 129–144. https://doi.org/10.1007/s11334-022-00524-9

Ozcan, M., & Peker, S. (2023). A classification and regression tree algorithm for heart disease modeling and prediction. Healthcare Analytics, 3(November 2022), 100130. https://doi.org/10.1016/j.health.2022.100130

Plackett, R. L. (1984). Karl Pearson and the Chi-squared Test. International Statistical Institute, 64(1), 50–53. https://doi.org/10.47316/cajmhe.2024.5.1.05

Poslavskaya, E., & Korolev, A. (2023). Encoding categorical data: Is there yet anything “hotter” than one-hot encoding?

Rao, P. V., & Srivastava, K. K. (2024). Extraction and Feature Selection for Precise Cardiovascular Disease Classification. International Journal for Multidimensional Research Perspectives, 2(7), 79–87. https://doi.org/10.61877/ijmrp.v2i7.172

Reddy, K. V. V., Elamvazuthi, I., Aziz, A. A., Paramasivam, S., Chua, H. N., & Pranavanand, S. (2021). Heart disease risk prediction using machine learning classifiers with attribute evaluators. MDPI, 11(18). https://doi.org/10.3390/app11188352

Sarra, R. R., Dinar, A. M., Mohammed, M. A., & Abdulkareem, K. H. (2022). Enhanced Heart Disease Prediction Based on Machine Learning and ?2 Statistical Optimal Feature Selection Model. MDPI, 6(5). https://doi.org/10.3390/designs6050087

Shaon, M. S. H., Karim, T., Shakil, M. S., & Hasan, M. Z. (2024). A comparative study of machine learning models with LASSO and SHAP feature selection for breast cancer prediction. Elsevier, 6(February 2023), 100353. https://doi.org/10.1016/j.health.2024.100353

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Elsevier, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002

Spencer, R., Thabtah, F., Abdelhamid, N., & Thompson, M. (2020). Exploring feature selection and classification methods for predicting heart disease. Digital Health, 6, 1–10. https://doi.org/10.1177/2055207620914777

Sushma, S. J., Assegie, T. A., Vinutha, D. C., & Padmashree, S. (2021). An improved feature selection approach for chronic heart disease detection. Bulletin of Electrical Engineering and Informatics, 10(6), 3501–3506. https://doi.org/10.11591/eei.v10i6.3001

World Health Organization. (2021). Cardiovascular diseases (CVDs). World Health Organixation. https://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

Yahaya, L., Oye, N. D., & Adamu, A. (2020). Performance Analysis of Some Se-Lected Machine Learning Algo-Rithms on Heart Disease Predic-Tion Using the Noble Uci Datasets. International Journal of Engineering Applied Sciences and Technology, 5(1), 36–46. https://doi.org/10.33564/ijeast.2020.v05i01.006

Yulianto, S. P. R., Fanani, A. Z., Affandy, A., & Aziz, M. I. (2024). Analisis Metode Smoote pada Klasifikasi Penyakit Jantung Berbasis Random Forest Tree. Jurnal Media Informatika Budidarma, 8(3), 1460. https://doi.org/10.30865/mib.v8i3.7712

Downloads

PlumX Metrics

Published

31-03-2025

How to Cite

Hirmayanti, Hirmayanti, and Ema Utami. 2025. “Perbandingan Algoritma Machine Learning Menggunakan Pemilihan Fitur Chi-Square Dalam Pengklasifikasian Penyakit Jantung”. Jurnal Saintekom : Sains, Teknologi, Komputer Dan Manajemen 15 (1):15-29. https://doi.org/10.33020/saintekom.v15i1.815.

Issue

Section

Articles