Abstract:
Various organization have been increasing in use of business data containing knowledge
for decision support. Knowledge bases become one of the main issues that every organization has
to consider. However, to extract knowledge from huge data sets is not an easy task. The objective
of this master project was to develop a system for knowledge extraction from data using
K-Means and Fuzzy C-Means techniques. In addition, the objective was to compare models
between the two techniques using Iris data, Wine Recognition, and Thyroid gland data.
The system was developed using Microsoft Visual Basic 6.0 operating on the Microsoft Window
XP. From the experimental simulations, when the models were generated using Iris data to form
K-Means and Fuzzy C-Means, the prediction accuracy were 88.49% and 91.11%, respectively.
When the models were generated using Wine Recognition data to form K-Means and
Fuzzy C-Means, the prediction accuracy were 69.11% and 71.91%, respectively. In addition,
when the models were generated using Thyroid gland data to form K-Means and Fuzzy
C-Means, the prediction accuracy were 85.80% and 83.89%, respectively. In addition, the two
techniques were compared by using t-test. With the significant value of 0.03, t-value was 4.13,
24.39 and -5.22, for Iris data, Wine Recognition, and Thyroid gland data, respectively. The results
showed that K-Means and Fuzzy C-Means technique are different. However, it cannot be
concluded that which one is better. One technique may perform better than the other in one
data set, while the other one may perform better in some other data sets.