A COMPARATIVE STUDY OF DIABETES DETECTION USING THE PIMA INDIAN DIABETES DATABASE

  • ABDULAZEEZ MOUSA Dept. of Computer, College of Science, Nawroz University, Kurdistan Region–Iraq
  • WARAZ MUSTAFA Dept. of Computer, College of Science, University of Duhok, Kurdistan Region–Iraq
  • RIDWAN BOYA MARQAS Dept. of Computer, College of Science, Nawroz University, Kurdistan Region–Iraq
  • SHIVAN H. M. MOHAMMED Dept. of Computer, College of Science, University of Duhok, Kurdistan Region–Iraq
Keywords: Diabetes detection, deep learning, Long Short-Term Memory (LSTM), Random Forest (RF), Convolutional Neural Network (CNN), Pima Indians Diabetes Database

Abstract

The accurate detection of diabetes plays a critical role in early intervention and effective management of the disease. In recent years, deep learning models have shown great potential in medical diagnosis tasks, including diabetes detection. This paper presents a comparative study of three popular models - Long Short-Term Memory (LSTM), Random Forest (RF), and Convolutional Neural Network (CNN) - for diabetes detection on the widely used Pima Indians Diabetes Database. The study aims to evaluate the performance of these models using common evaluation metrics, such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). The dataset is preprocessed to handle missing values, normalize features, and split into training and testing sets. Each model is trained on the training set and evaluated on the testing set. The results of the study show that the LSTM model achieves the highest performance across all metrics. It demonstrates the ability to capture the temporal nature of the dataset and extract meaningful patterns for accurate diabetes detection. RF and CNN models also exhibit promising performance but slightly lower metrics compared to LSTM. In the comparative analysis, the strengths and weaknesses of each model are discussed. LSTM, as a recurrent neural network, excels in capturing temporal dependencies, while RF offers simplicity and interpretability. CNN, although originally designed for image analysis, shows potential when adapted to tabular data. The findings of this study have implications for healthcare practitioners and researchers working on diabetes detection. The LSTM model achieves its highest accuracy at 85%, demonstrating its effectiveness as an accurate method for predicting diabetes using the Pima Indians Diabetes Database (PIDD). However, it is important to acknowledge the limitations of the study, such as the relatively small dataset size and potential class imbalance in the dataset. Future research can address these limitations and further investigate the application of deep learning models in diabetes detection

Downloads

Download data is not yet available.

References

Gargeya, R., and Leng, T. (2017). Automated identification of diabetic retinopathy using deep learning. Ophthalmology, 124(7), 962-969.
Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., and Sherazi, H. H. R. (2021). Machine learning based diabetes classification and prediction for healthcare applications. Journal of healthcare engineering, 2021.
PIMA Indians Diabetes Database. (2016, October 6). Kaggle. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database.
Zhou, H., Myrzashova, R., and Zheng, R. (2020). Diabetes prediction model based on an enhanced deep neural network. EURASIP Journal on Wireless Communications and Networking, 2020, 1-13.
Tigga, N. P., and Garg, S. (2020). Prediction of type 2 diabetes using machine learning classification methods. Procedia Computer Science, 167, 706-716.
Rabie, O., Alghazzawi, D., Asghar, J., Saddozai, F. K., and Asghar, M. Z. (2022). A decision support system for diagnosing diabetes using deep neural network. Frontiers in Public Health, 10, 861062.
Chowdary, P. B. K., and Kumar, R. U. (2021). An effective approach for detecting diabetes using deep learning techniques based on convolutional LSTM networks. International Journal of Advanced Computer Science and Applications, 12(4).
Kumari, S., Kumar, D., and Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 2, 40-46.
Temurtas, H., Yumusak, N., and Temurtas, F. (2009). A comparative study on diabetes disease diagnosis using neural networks. Expert Systems with applications, 36(4), 8610-8615.
Kaul, S., and Ku;mar, Y. (2020). Artificial intelligence-based learning techniques for diabetes prediction: challenges and systematic review. SN Computer Science, 1(6), 322.

Aziz, T., Charoenlarpnopparut, C., and Mahapakulchai, S. (2023). Deep learning-based hemorrhage detection for diabetic retinopathy screening. Scientific Reports, 13(1), 1479.
Ragab, M., AL-Ghamdi, A. S., Fakieh, B., Choudhry, H., Mansour, R. F., and Koundal, D. (2022). Prediction of diabetes through retinal images using deep neural network. Computational Intelligence and Neuroscience, 2022.
Aslan, M. F., and Sabanci, K. (2023). A novel proposal for deep learning-based diabetes prediction: Converting clinical data to image data. Diagnostics, 13(4), 796.
Chang, V., Bailey, J., Xu, Q. A., and Sun, Z. (2023). Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications, 35(22), 16157-16173.

Naz, H., and Ahuja, S. (2020). Deep learning approach for diabetes prediction using PIMA Indian dataset. Journal of Diabetes & Metabolic Disorders, 19, 391-403.
García-Ordás, M. T., Benavides, C., Benítez-Andrades, J. A., Alaiz-Moretón, H., and García-Rodríguez, I. (2021). Diabetes detection using deep learning techniques with oversampling and feature augmentation. Computer Methods and Programs in Biomedicine, 202, 105968.
Gupta, H., Varshney, H., Sharma, T. K., Pachauri, N., and Verma, O. P. (2022). Comparative performance analysis of quantum machine learning with deep learning for diabetes prediction. Complex & Intelligent Systems, 8(4), 3073-3087.
Goodfellow, I. (2016). Deep Learning-Ian Goodfellow, Yoshua Bengio, Aaron Courville. Adapt. Comput. Mach. Learn.
Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. " O'Reilly Media, Inc.".
Tran, K. P., Du Nguyen, H., and Thomassey, S. (2019). Anomaly detection using long short term memory networks and its applications in supply chain management. IFAC-PapersOnLine, 52(13), 2408-2412.
Rahman, M., Islam, D., Mukti, R. J., and Saha, I. (2020). A deep learning approach based on convolutional LSTM for detecting diabetes. Computational biology and chemistry, 88, 107329.
Wang, X., Zhai, M., Ren, Z., Ren, H., Li, M., Quan, D., ... and Qiu, L. (2021). Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier. BMC medical informatics and decision making, 21(1), 1-14.
Wang, X., Lu, Y., Wang, Y., and Chen, W. B. (2018, July). Diabetic retinopathy stage classification using convolutional neural networks. In 2018 IEEE International Conference on Information Reuse and Integration (IRI) (pp. 465-471). IEEE.
Published
2023-10-12
How to Cite
MOUSA, A., MUSTAFA, W., MARQAS, R. B., & MOHAMMED, S. H. M. (2023). A COMPARATIVE STUDY OF DIABETES DETECTION USING THE PIMA INDIAN DIABETES DATABASE. Journal of Duhok University, 26(2), 277-288. https://doi.org/10.26682/sjuod.2023.26.2.24
Section
Pure and Engineering Sciences