A Statistical Analysis of Outliers in Diabetes Prediction |
Author(s): |
| Gurleen Kaur , Chandigarh University; Harsh Pratap Jain, Chandigarh University; Vashnavi Walia, Chandigarh University; Asmita Singh, Chandigarh University |
Keywords: |
| Diabetes Prediction Model, Outliers, Machine Learning, Naive Bayes, Random Forest, Decision Trees, KNN, Logistic Regression, F1-score, Confusion Matrix |
Abstract |
|
In response to the growing incidence of the disease, this study creates a machine learning pipeline for diabetes pre diction utilizing the Pima Indians Diabetes database. This paper presents the analysis of outliers and its effects on the performance of prediction model. In this study, we included Naive Bayes, Random Forest, Decision Trees, KNN, and Logistic Regression. The analysis gives the selection of optimal predictive method, significant features identification, and the effects of class imbalance. Based on clinical factors such as blood pressure, insulin, glucose, and BMI, we performed this study. Preprocessing the data is a part of the pipeline to handle outliers and missing values and provide objective training. Performance measurements are used, with an emphasis on F1-score because of class imbalance. These metrics include accuracy, precision, recall, and F1-score. We analyze the results on the basis of these performance parameters and concluded that Random Forest attaining the maximum accuracy and F1 score. |
Other Details |
|
Paper ID: IJSRDV13I10004 Published in: Volume : 13, Issue : 1 Publication Date: 01/04/2025 Page(s): 9-14 |
Article Preview |
|
|
|
|
