Mullika Pattanodom. Handling missing values by cluster ensemble approach. Master's Degree(Information Technology). Mae Fah Luang University. The Learning Resources and Education Media Center. : Mae Fah Luang University , 2016.
Handling missing values by cluster ensemble approach
Abstract:
The problem of missing values arises as one of the major difficulties in data mining and the down streaming applications. In fact, most of the analytical techniques established in this field have been developed to handle a complete data set. Imputing or filling in missing values is generally regarded as a data preprocessing task, for which several methods have been introduced. These include a collection of statistical alternatives such as average and zero imputes as well as learning-led models like nearest neighbors and regression. As for cluster analysis, various clustering algorithms even k-means, the most well-known, are hardly design to handle such a problem. This is also the case with cluster ensembles, where an improved decision is generated upon multiple results of clustering complete data. The thesis presents a new framework ensemble that allows clustering incomplete data without the usual preprocessing step. Intuitively, different versions of the original data can be created by filling in those unknown values with arbitrary ones. There are two methods for new framework for imputation clustering data with missing values by ensemble, including random imputation method that presents random values selection. This random selection is simple and efficient. In addition, the imputation desires more efficient, thus; the second method is a hybrid method proposing the various imputation methods and promoting the diversity within an ensemble. In particular, hybrid method has been adopted to summarize ensemble information, from which k-means is exploited to derive the final clustering. The proposed methods are evaluated against a number of benchmark imputation methods, over different datasets obtained from UCI repository based on the evaluation metric of cluster accuracy (CA).
Mae Fah Luang University. The Learning Resources and Education Media Center