COMPARISON OF THE IMPUTATION METHODS IN THE MULTIPLE LINEAR REGRESSION MODEL WHEN PERCENTAGES OF MISSING DEPENDENT AND INDEPENDENT VARIABLES ARE DIFFERENT UNDER NONIGNORABLE MISSINGNESS
Abstract:
The objective of this research is to compare the imputation methods for multiple linear regression model that the dependent and independent variables are under nonignorable-missingness. The imputation methods are EM Algorithm, K-Nearest Neighbor Imputation (KNN) and Predictive Mean Matching Imputation (PMM). The data are simulated under three levels of missing proportion of data of 10 % , 15% and 20 %, three levels of nonignorable-missingness of none, medium and high, and levels of the ratio of missing proportion of the dependent and independent variables of 1:1, 1:1.5, 1.5:1, 1:2 and 2:1. The comparison of each imputation methods using the size of average mean squared error (AMSE), the findings are the followings: i) all of the imputation methods perform better when the missing proportion of the dependent variable are larger than proportional loss of independent variables, ii) EM Algorithm perform best when the missingness appears on independent variables with low standard deviation, iii) K-Nearest Neighbor Imputation (KNN) perform best when the standard deviation of the errors are high (90) and the levels of nonignorable-missingness are medium and high, iv) Predictive Mean Matching Imputation (PMM) perform best when the standard deviation the errors are not high (10,30) and the levels of nonignorable-missingness is none.