Abstract:
The classification of types of diabetic patients is difficult because there are not only variant features, but many features needed to diagnose the symptom of diabetes. This research proposes to classify the types of patients, whether or not they are diabetic, using machine learning to find the factor for feature selection. The study was utilized data of 536 people from the website https://www.kaggle.com, that collected on 8 features causing diabetes as following; Pregnancies, glucose in blood, blood pressure, skin thickness, insulin in blood, body mass index, diabetes pedigree function, and age. By training and testing ratio of 90:10%, 80:20%, 70:30%, 60:40%, 50:50% and splitting a data set for 10-fold cross-validation, the result was showed that the optimizing method, Gradient Boosted Trees, has an efficiency at 87.14% and standard deviation at 0.80 with the best efficacy of feature selection by Filter-based factor selection method with Decision Tree of only 4 factors: glucose in blood, age, frequency of pregnancies and insulin in blood. According those of factors, the efficacy of diabetic classification would heal and cure diabetic with a speedy recovery and longer life.