Patcharaporn Panwong. Noise-induced ensemble generation for data clustering framework and application. Doctoral Degree(Computer Engineering). Mae Fah Luang University. Learning Resources and Educational Media Center. : Mae Fah Luang University, 2021.
Noise-induced ensemble generation for data clustering framework and application
Abstract:
With a usual interpretation regarding the negative effects of noise on data analysis, it may be eliminated from a data collection under investigation in the cleansing process. Besides, this has also been overcome using a noise-tolerant method, which is designed to minimize the impact of erroneous readings and preserve the integrity of exposed knowledge. On the other hand, there are a few studies that have recently discovered a way to make good use of noise, such as privacy-preserving data analytic, single and ensemble clustering. Specific to consensus clustering, this research introduces a unique study of employing noise in the process of cluster ensemble generation to be a framework of attribute-wise noise injection, which is called noise-induced ensemble generation. The proposed framework has proven effective to increase diversity within an ensemble, based on the data perturbation with different levels of noise of 1-10% and eight investigated noise-injection cases. It is designed to improve the existing strategy, i.e., the coupling of k-means and random-k strategies. The experimental results with fifteen UCI benchmarks and different validity indices often outperform the baseline counterpart, with respect to both groups of dataset with partly- and highly-overlapped clusters. To this extent, the work presents a context-based application of the noise-injection cases, where other variables such as noise ratios and perturbation parameters have also been analyzed and discussed. Of course, this provides such a useful guideline for the practical use of the new ensemble clustering technique. In addition to those benchmark data collections, it is applied to the light curve data that categorizes a range of astronomical objects. A more accurate clustering is achieved by using this novel generation strategy, especially with a more complex consensus function like graphSpec. There are several questions left to be further explored, which are summarized as future works at the end of this thesis.
Mae Fah Luang University. Learning Resources and Educational Media Center