Damrongrit Setsirichok. Handling problems in human genetics by means of attribute selection. Doctoral Degree(Electrical Engineering). King Mongkut's University of Technology North Bangkok. Central Library. : King Mongkut's University of Technology North Bangkok, 2013.
Handling problems in human genetics by means of attribute selection
Abstract:
This thesis presents the application of attribute selection to three problems in human genetics. The attribute selection of interest covers all three main categories: filter, wrapper and embedded approaches. The first problem covers the thalassaemia classification. The aim is to identify informative attributes among attributes provided by a complete blood count (CBC) and haemoglobin typing. Wrappers embedded with a C4.5 decision tree, a naïve Bayes classifier and a multilayer perceptron show that the haemoglobin concentration, which is a CBC attribute, is redundant and can thus be removed from the classification. The second problem interests in the identification of ancestry informative markers (AIMs) from single nucleotide polymorphisms (SNPs) for classifying continental populations. Four populations namely the ASN, CEU, MEX and YRI populations from the HapMap Phase III data are covered. Two-stage attribute selection consisting of a round robin symmetrical uncertainty ranking technique, which is a filter technique, and a wrapper embedded with a naïve Bayes classifier is applied to the data. The resulting AIM panels are sufficiently small for the classification and can subsequently be used to provide an insight into the degree of population admixture in the ASW and CHD populations. The last problem involves the detection of pure epistasis and genetic heterogeneity in genome-wide association studies. Representatives of filter, wrapper and embedded approaches namely an omnibus permutation test on ensembles of two-locus analyses (2LOmb), a multifactor dimensionality reduction (MDR) technique and a random forest (RF) are benchmarked in simulations covering two independent epistatic interactions among SNPs. The simulation results indicate that 2LOmb outperforms MDR and RF techniques in terms of a low number of output SNPs and a high number of correctly-identified causative SNPs. Subsequently, 2LOmb detects pure epistasis and genetic heterogeneity in type 1 diabetes mellitus (T1D) data from the Wellcome Trust Case Control Consortium (WTCCC). Overall, the thesis reveals how a number of problems in human genetics can be successfully tackled by means of attribute selection.