Abstract:
The objectives of this research were to 1) examine rater effects in the evaluation of the science standards and indicators-classroom test items alignment in the junior secondary school education, 2) compare the extent to which standards & indicators and test items was aligned after and before controlling for severity/leniency effects, 3) investigate and compare the degree of alignment between national and classroom science item tests assessed by Porters alignment index among schools with different science achievements, and 4) estimate and compare the generalizability coefficients of the evaluation results of the standards and indicators-classroom test items alignment when the different number of raters and evaluation designs vary. Research subjects were 1,089 science classroom test items used in junior secondary school under the Office of the Basic Education Commission in Bangkok, and 20 expert panelists who evaluate alignment. Research instrument were the evaluation of the science standards and indicators-items alignment scale. MFRM, Paired-samples t-test, alignment index analysis, and G-theory were employed to analyze the data. Research results were as follows: 1. There were severity/leniency effects in the evaluation of the science standards and indicators-classroom test items alignment. The raters tend to be severe rather than a lenient (rater logit ranged from -3.24 to 1.83). The majority of 16 raters (80.00%) fit the profile of accurate raters. No raters exhibited central tendency effect, restriction of range effect, randomness effect. The 4 raters (20.00%) exhibited other profile. 2. There was a statistically significant difference at .01 level between before and after controlling for severity/leniency effects in the evaluation of the standards and indicators-classroom test items alignment (t = 17.044, p = .00). When the severity/leniency effects were controlled, there were 21 item tests (1.93%) changes in the alignment evaluation results, and there were 901 item tests (82.74%) that alignment with standards and indicators and fit to model (Fair-M Average ranged from 3.06 to 3.97, infit MNSQ and outfit MNSQ ranged from 0.50 to 1.50). 3. The schools with different science achievements have similar alignment indices between national and classroom science item tests. The alignment indices were between 0.436 and 0.588. 4. The generalizability coefficient for an absolute decision of the evaluation scores of the standards and indicators-classroom test items alignment increased when the number of raters increased in all evaluation designs. In the cognitive demand evaluation and the alignment between items and indicators level evaluation with 5-point rating scale the number of raters at least equal 2 and 3 raters respectively yield an acceptably high generalizability coefficient.