Abstract:
There were 4 objectives for this research: (1) to compare the estimators properties in biasedness, consistency, and relative efficiency aspects among the effect size derived from Classical Test Theory (d[subscript CTT]), effect size derived from Item Response Theory which estimation models fit for the data (d[subscript IRT1]), effect size derived from Item Response Theory which estimation models unfit for the data (d[subscript IRT2]), correlation coefficient derived from Classical Test Theory (r[subscript CTT]), correlation coefficient derived from Item Response Theory which estimation models fit for the data (r[subscript IRT1]), and correlation coefficient derived from Item Response Theory which estimation models unfit for the data (r[subscript IRT2]); (2) to compare the means among d[subscript CTT], d[subscript IRT1], and d[subscript IRT2] as well as among r[subscript CTT], r[subscript IRT1], and r[subscript IRT2]; (3) to study the relationship between d[subscript CTT] and d[subscipt IRT1] and express the regression equation of d[subscript IRT1] on d[subscirpt CTT]; (4) to study the relationship between r[subscript CTT] and r[subscript IRT1] and express the regression equation of r[subscript IRT1] on r[subscript CTT]. The 540 examination situations were built up from the conditions of the true effect magnitudes (.2, .5, .8, 1.2, 2.6), sample sizes (20, 50, 500, 2,000), test lengths (10, 50, 90), based models (one-, two-, and three-parameter logistic model), and estimation models (classical test model, item response models which fit for the data, item response models which unfit for the data). The summarized findings were: (1) in the overview, the lowest biased estimator was r[subscript IRT1], the highest consistency estimator was r[subscript CTT], and the highest relative efficiency estimator was r[subscript IRT1], in addition, r[subscript IRT1] was the most appropriate estimator for all properties; (2) the means of d[subscript CTT], d[subscript IRT1], and d[subscript IRT] were different at the .05 significance level, in fact, the mean of d[subscript CTT] was the highest, in the same way, the means of r[subscript CTT], r[subscript IRT1], and r[subscript IRT] were different at the .05 significance level and the mean of r[subscript CTT] was the highest; (3) the correlation coefficient between d[subscript CTT] and d[subscript IRT1] was .626 and significant at the .05 level, the regression of d[subscript IRT] on d[subscript CTT] could be expressed by d[subscript IRT1] = .004 + .065d[subscript CTT] for the raw scores and Z=.626Z[subscript d][subscript CTT] for the standardized scores; (4) the correlation coefficient between r[subscript CTT] and r[subscript IRT1] was .570 and significant at the .05 level, the regression of r[subscript IRT1] on r[subscript CTT] could be expressed by r[subscript IRT1] = .003 + .079r[subscript CTT] for the raw scores and Z[subscript r][subscript IRT1] = .570Z[subscript r][subscript CTT] for the standardized scores