Abstract:
Problems of missing data are common in all fields of research. When the missingness of data depends on the parameters of interest, this could lead to serious problems. This type of missingness is called nonignorable. One remedy to deal with missing data is to estimate or to approximate the missing data by various methods. The purpose of this research is to study and to compare the estimation methods under multiple linear regression settings with nonignorable missing data on the dependent variables. The methods for estimating missing data are EM Algorithm (EM), K-Nearest Neighbor Imputation (KNN) and Predictive Mean Matching Imputation (PMM) method. Three levels of missing proportion of data of 10%, 20%, 30% and three levels of nonignorable missingness of none, medium, high are studied from the simulations. Based on the size of average mean square error (AMSE), the findings are the followings: i) all estimation methods perform better as the sample size increases, ii) all estimation methods perform worse as the standard deviation of errors, the missing proportion, or level of nonignorable missingness increase, iii) overall, EM method performs best when the standard deviation of errors are not high (10-30) and iv) KNN method performs best when the standard deviation is high (90).