Nattaphon Talmongkol. Performance Comparisons of Machine Learning Approaches for Numerically Structured Breast Cancer Data Classifications. Master's Degree(Engineering Technology). Thai-Nichi Institute of Technology. Center of Academic Resource. : Thai-Nichi Institute of Technology, .
Performance Comparisons of Machine Learning Approaches for Numerically Structured Breast Cancer Data Classifications
Abstract:
This thesis has compared the performance of a machine learning approaches for numerically structured breast cancer data classifications. The data classification is one of data analytics methods which common use to encourage the decision-making aim of many businesses, medical, healthcare or any requirement that impact to the human life. Furthermore, this thesis has been perceived the motivation from the big data analytics technologies which can drive to improve Thailand technologies in the present trend. Subsequently, the breast cancer dataset was selected from the UCI data repository in the large of a dataset stock to support many researchers to perform the data analytics purposes. Moreover, this dataset contains 30 features of three breast cell nucleus which shown the 569 instances of historical diagnostic. Nonetheless, the machine learning techniques which selected to perform in this thesis can be divided into four algorithms of the Decision Tree (DT), Naive Bayes (NB), Artificial Neural Network (ANN), and Support Vector Machine (SVM). In addition, the cross-validation method is conducted to evaluate the performance of each technique by random the parameter customization to find the highest performance which separated into three categories of accuracy rate, error rate, and classification lead time. The research objective is to find the appropriate of a classifier that has provided the highest performance using the RapidMiner studio 7.4 program. The results have shown that the highest performance of machine learning approach for breast cancer data classification is SVM technique which shown the accuracy percentage of 96.84%, the F-measure (M) percentage of 95.70%, the F-measure (B) percentage of 97.50%, the RMSE of 0.194, and classification lead time of 0.52 second followed by the Decision Tree, Artificial Neural Network, and Naive Bayes respectively
Thai-Nichi Institute of Technology. Center of Academic Resource