Abstract:
Polycystic ovary syndrome (PCOS) is a disease that can occur in women of childbearing age. As a result, the researcher is interested in developing a self-assessment predictive model for PCOS that does not require laboratory data by examining the disease's symptoms and selecting those that indicate a high risk of acquiring the disease. The Kaggle Database of PCOS [9] has both physical and laboratory data. By selecting the 11 most important self-observable features from the random forest method. It is found that the Random Forest model before and after selecting the most important selfobservable from Area under the curve (AUC) is the same value of 0.97. The selected features were used to create six other models: Decision Tree, Nave Bayes, Logistic Regression, Support Vector Classifiers (SVC), K-Nearest Neighbor (KNN), and CatBoost Classifier to find efficient models with test scores higher than the training score, it was concluded that there were three models, the Random Forest, the Logistic Regression and Support Vector Classifiers (SVC).