High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

A Statistical Analysis of Outliers in Diabetes Prediction

Author(s):

Gurleen Kaur , Chandigarh University; Harsh Pratap Jain, Chandigarh University; Vashnavi Walia, Chandigarh University; Asmita Singh, Chandigarh University

Keywords:

Diabetes Prediction Model, Outliers, Machine Learning, Naive Bayes, Random Forest, Decision Trees, KNN, Logistic Regression, F1-score, Confusion Matrix

Abstract

In response to the growing incidence of the disease, this study creates a machine learning pipeline for diabetes pre diction utilizing the Pima Indians Diabetes database. This paper presents the analysis of outliers and its effects on the performance of prediction model. In this study, we included Naive Bayes, Random Forest, Decision Trees, KNN, and Logistic Regression. The analysis gives the selection of optimal predictive method, significant features identification, and the effects of class imbalance. Based on clinical factors such as blood pressure, insulin, glucose, and BMI, we performed this study. Preprocessing the data is a part of the pipeline to handle outliers and missing values and provide objective training. Performance measurements are used, with an emphasis on F1-score because of class imbalance. These metrics include accuracy, precision, recall, and F1-score. We analyze the results on the basis of these performance parameters and concluded that Random Forest attaining the maximum accuracy and F1 score.

Other Details

Paper ID: IJSRDV13I10004
Published in: Volume : 13, Issue : 1
Publication Date: 01/04/2025
Page(s): 9-14

Article Preview

Download Article