Sensitivity of test items to teaching quality

Alexander Naumann; Svenja Rieser; Stephanie Musow; Jan Hochweber; Johannes Hartig

Sensitivity of test items to teaching quality
ARTICLE

Alexander Naumann, DIPF, Germany ; Svenja Rieser, University of Wuppertal (BUWI), Germany ; Stephanie Musow, Jan Hochweber, University of Teacher Education St. Gallen (PHSG), Switzerland ; Johannes Hartig, DIPF, Germany

Learning and Instruction Volume 60, Number 1, April 2019 ISSN 0959-4752 Publisher: Elsevier Ltd

Save

Export

E-mail

OpenURL

Abstract

Instructional sensitivity is the psychometric capacity of tests or single items of capturing effects of classroom instruction. Yet, current item sensitivity measures’ relationship to (a) actual instruction and (b) overall test sensitivity is rather unclear. The present study aims at closing these gaps by investigating test and item sensitivity to teaching quality, reanalyzing data from a quasi-experimental intervention study in primary school science education (1026 students, 53 classes, Mage = 8.79 years, SDage = 0.49, 50% female). We examine (a) the correlation of item sensitivity measures and the potential for cognitive activation in class and (b) consequences for test score interpretation when assembling tests from items varying in their degree of sensitivity to cognitive activation. Our study (a) provides validity evidence that item sensitivity measures may be related to actual classroom instruction and (b) points out that inferences on teaching drawn from test scores may vary due to test composition.

Citation

Naumann, A., Rieser, S., Musow, S., Hochweber, J. & Hartig, J. (2019). Sensitivity of test items to teaching quality. Learning and Instruction, 60(1), 41-53. Elsevier Ltd. Retrieved August 13, 2024 from https://www.learntechlib.org/p/199872/.

This record was imported from Learning and Instruction on March 15, 2019. Learning and Instruction is a publication of Elsevier.

Full text is availabe on Science Direct: http://dx.doi.org/10.1016/j.learninstruc.2018.11.002

Keywords

References

View References & Citations Map

Adl-Amini, K., Decristan, J., Hondrich, A.L., & Hardy, I. (2014). Umsetzung von peer-gestütztem Lernen im naturwissenschaftlichen Sachunterricht der Grundschule. Zeitschrift für Grundschulforschung, 7(2), pp. 74-87.
AERA, APA, & NCME (2014). Standards for educational and psychological testing. Washington, DC.
Airasian, P.W., & Madaus, G.F. (1983). Linking testing and instruction: Policy issues. Journal of Educational Measurement, 20, pp. 103-118.
Arens, A.K., Morin, A.J.S., & Watermann, R. (2015). Relations between classroom disciplinary problems and student motivation: Achievement as a potential mediator?. Learning and Instruction, 39, pp. 184-193.
Baker, E.L. (1994). Making performance assessment work: The road ahead. Educational Leadership, 51(6), pp. 58-62.
Baumert, J., & Kunter, M. (2013). The effect of content knowledge and pedagogical content knowledge on instructional quality and student achievement. Cognitive activation in the mathematics classroom and professional competence of teachers: Results from the COACTIV project, pp. 175-206. New York, Heidelberg, Dordrecht, London: Springer.
Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., (2010). Teachers' mathematical knowledge, cognitive activation in the classroom, and student progress. American Educational Research Journal, 47(1), pp. 133-180.
Bill, & Melinda Gates Foundation (2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains.
Blömeke, S., Gustafsson, J.-E., & Shavelson, R.J. (2015). Approaches to competence measurement in higher education. Zeitschrift für Psychologie, 223(1), pp. 3-13.
Bos, W., Valentin, R., Horberg, S., Arnold, K.-H., Faust, G., Fried, L., (2008). Zusammenfassung und Schlussfolgerungen. IGLU-E 2006. Die Länder der Bundesrepublik Deutschland im nationalen und internationalen Vergleich, pp. 143-156. Münster: Waxmann.
Brühwiler, C., & Blatchford, P. (2011). Effects of class size and adaptive teaching competency on classroom processes and academic outcome. Learning and Instruction, 21, pp. 95-108.
Burstein, L. (1989). Conceptual considerations in instructionally sensitive assessment. Paper presented at the annual meeting of the american educational research association, san Francisco, CA.
Cox, R.C., & Vargas, J.S. (1966). A comparison of item-selection techniques for norm referenced and criterion referenced tests. Paper presented at the annual conference of the national council on measurement in education, chicago, IL.
Creemers, B.P.M., & Kyriakides, L. (2008). The dynamics of educational effectiveness: A contribution to policy, practice and theory in contemporary schools. London and New York: Routledge.
Danielson, C. (2007). Enhancing professional practice: A framework for teaching.
Decristan, J., Hondrich, A.L., Büttner, G., Hertel, S., Klieme, E., Kunter, M., (2015). Impact of additional guidance in science education on primary students' conceptual understanding. The Journal of Educational Research, 108(5), pp. 358-370.
Decristan, J., Klieme, E., Kunter, M., Hochweber, J., Büttner, G., Fauth, B., (2015). Embedded formative assessment and classroom process quality: How do they interact in promoting science understanding?. American Educational Research Journal, 52(6), pp. 1133-1159.
Deutscher, V., & Winther, E. (2018). Instructional sensitivity in vocational education. Learning and Instruction, 53, pp. 21-33.
ESSA (2015). Every student Succeeds Act of 2015. Every student Succeeds Act of 2015.
Fauth, B., Decristan, J., Rieser, S., Klieme, E., & Büttner, G. (2014). Student ratings of teaching quality in primary school: Dimensions and prediction of student outcomes. Learning and Instruction, 29, pp. 1-9.
Fauth, B., Decristan, J., Rieser, S., Klieme, E., & Büttner, G. (2014). Grundschulunterricht aus Schüler-, Lehrer- und Beobachterperspektive: Zusammenhänge und Vorhersage von Lernerfolg. Zeitschrift für Padagogische Psychologie, 28(3), pp. 127-137.
Fox, J.-P. (2010). Bayesian item response modeling: Theory and applications. New York, NY: Springer.
French, B.F., Finch, H.W., Randel, B., Hand, B., & Gotch, C.M. (2016). Measurement invariance techniques to enhance measurement sensitivity. International Journal of Quantitative Research in Education, 3(1–2), pp. 79-93.
Gasser, L., Grütter, J., Buholzer, A., & Wettstein, A. (2018). Emotionally supportive classroom interactions and students' perceptions of their teachers as caring and just. Learning and Instruction, 54, pp. 82-92.
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., & Rubin, D. (2013). Bayesian data analysis.
Grossman, P., Cohen, J., Ronfeldt, M., & Brown, L. (2014). The test matters: The relationship between classroom observation scores and teacher value added on multiple types of assessment. Educational Researcher, 43(6), pp. 293-303.
Grossman, P., Loeb, S., Cohen, J., & Wyckoff, J. (2013). Measure for measure: The relationship between measures of instructional practice in middle school English language arts and teachers' value added scores. American Journal of Education, 119(3), pp. 445-470.
Gustafsson, J.-E., & Balke, G. (1993). General and specific abilities as predictors of school achievement. Multivariate Behavioral Research, 28(4), pp. 407-434.
Haladyna, T.M. (2004). Developing and validating multiple-choice test items. Mahwah, NJ: Lawrence Erlbaum Associates.
Haladyna, T.M., & Roid, G.H. (1981). The role of instructional sensitivity in the empirical review of criterion-referenced test items. Journal of Educational Measurement, 18, pp. 39-53.
Hardy, I., Jonen, A., Möller, K., & Stern, E. (2006). Effects of instructional support within constructivist learning environments for elementary school students' understanding of “floating and sinking. Journal of Educational Psychology, 98(2), pp. 307-326.
Hardy, I., Kleickmann, T., Koerber, S., Mayer, D., Möller, K., Pollmeier, J., (2010). Die Modellierung naturwissenschaftlicher Kompetenz im Grundschulalter. Zeitschrift für Padagogik, 56, pp. 115-125.
Hartig, J., Klieme, E., & Leutner, D. (2008). Assessment of competencies in educational contexts. Göttingen: Hogrefe & Huber.
Hondrich, A.L., Hertel, S., Adl-Amini, K., & Klieme, E. (2016). Implementing curriculum-embedded formative assessment in primary school science classrooms. Assessment in Education: Principles, Policy & Practice, 23(3), pp. 353-376.
Ing, M. (2018). What about the “instruction” in instructional sensitivity? Raising a validity issue in research on instructional sensitivity. Educational and Psychological Measurement, 78(4), pp. 635-652.
Joyce, J., Gitomer, D.H., & Iaconangelo, C.J. (2018). Classroom assignments as measures of teaching quality. Learning and Instruction, 54, pp. 48-61.
Kiemer, K., Gröschner, A., Pehmer, A.K., & Seidel, T. (2015). Effects of a classroom discourse intervention on teachers' practice and students' motivation to learn mathematics and science. Learning and Instruction, 35, pp. 94-103.
Kirschner, P.A., Sweller, J., & Clark, R.E. (2006). Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational Psychologist, 41(2), pp. 75-86.
Kleickmann, T., Hardy, I., Möller, K., Pollmeier, J., Tröbst, S., & Beinbrech, C. (2010). Die Modellierung naturwissenschaftlicher Kompetenz im Grundschulalter: Theoretische Konzeption und Testkonstruktion. Zeitschrift für Didaktik der Naturwissenschaften, 16, pp. 265-284.
Klieme, E. (2008). Systemmonitoring für den Sprachunterricht. Unterricht und Kompetenzerwerb in Deutsch und Englisch. Ergebnisse der DESI-Studie, pp. 1-10. Weinheim: Beltz.
Klieme, E. (2018). Unterrichtsqualität. Handbuch schulpädagogik . Münster: Waxmann.
Klieme, E., & Kuger, S. (2016). PISA 2015 Context Questionnaires Framework: Monitoring opportunities and outcomes, policies and practices model-ling patterns and relations, impacts and trends in education. Organisation for economic co-operation and development (OECD) (Ed.), PISA 2015 assessment and analytical framework, pp. 101-127. Paris: OECD Publishing.
Klieme, E., Pauli, C., & Reusser, K. (2009). The Pythagoras study: Investigating effects of teaching and learning in Swiss and German mathematics classrooms. The power of video studies in investigating teaching and learning in the classroom, pp. 137-160. Münster: Waxmann.
Kuger, S., Klieme, E., Lüdtke, O., Schiepe-Tiska, A., & Reiss, K. (2017). Mathematikunterricht und Schülerleistung in der Sekundarstufe: Zur Validität von Schülerbefragungen in Schulleistungsstudien. Zeitschrift für Erziehungswissenschaft, Sonderheft, 33, pp. 61-98.
Kunter, M., & Voss, T. (2013). The model of instructional quality in COACTIV: A multicriteria analysis. Cognitive activation in the mathematics classroom and professional competence of teachers: Results from the COACTIV project, pp. 97-124. New York, Heidelberg, Dordrecht, London: Springer.
Kyriakides, L., Christoforou, C., & Charalambous, C.Y. (2013). What matters for student learning outcomes: A meta-analysis of studies exploring factors of effective teaching. Teaching and Teacher Education, 36, pp. 143-152.
Lazonder, A.W., & Harmsen, R. (2016). Meta-analysis of inquiry-based learning: Effects of guidance. Review of Educational Research, 86(3), pp. 681-718.
Levy, R., & Mislevy, R.J. (2016). Bayesian psychometric modeling. Boca Raton, FL: Chapman & Hall/CRC.
Linn, R.L., & Harnisch, D.L. (1981). Interactions between item content and group membership on achievement test items. Journal of Educational Measurement, 18, pp. 109-118.
Lipowsky, F., Rakoczy, K., Pauli, C., Drollinger-Vetter, B., Klieme, E., & Reusser, K. (2009). Quality of geometry instruction and its short-term impact on students' understanding of the Pythagorean Theorem. Learning and Instruction, 19(6), pp. 527-537.
Li, H., Qin, Q., & Lei, P.-W. (2016). An examination of the instructional sensitivity of the TIMSS math items: A hierarchical differential item functioning approach. Educational Assessment, 22(1), pp. 1-17.
Maas, C.J.M., & Hox, J.J. (2005). Sufficient sample sizes for multilevel modeling. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 1(3), pp. 86-92.
Marsh, H.W., Lüdtke, O., Nagengast, B., Trautwein, U., Morin, A.J.S., & Abduljabbar, A.S. (2012). Classroom climate and contextual effects: Conceptual and methodological issues in the evaluation of group-level effects. Educational Psychologist, 47, pp. 106-124.
Marzano, R.J., & Marzano, J.S. (2003). The key to classroom management. Educational Leadership, 61(1), pp. 6-18.
Mashburn, A.J., Pianta, R.C., Hamre, B.K., Downer, J.T., Barbarin, O.A., Bryant, D., (2008). Measures of classroom quality in prekindergarten and children's development of academic, language, and social skills. Child Development, 79(3), pp. 732-749.
Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), pp. 149-174.
Mayer, R. (2004). Should there be a three-strikes rule against pure discovery learning? The case for guided methods of instruction. American Psychologist, 59, pp. 14-19.
Mehrens, W.A., & Phillips, S.E. (1986). Detecting impacts of curricular differences in achievement test data. Journal of Educational Measurement, 23(3), pp. 185-196.
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), pp. 13-23.
Mislevy, R.J., & Haertel, G. (2006). Implications of evidence-centered design for educational assessment. Educational Measurement: Issues and Practice, 25, pp. 6-20.
Möller, K., Jonen, A., Hardy, I., & Stern, E. (2002). Die Förderung von naturwissenschaftlichem Verständnis bei Grundschulkindern durch Strukturierung der Lernumgebung. Bildungsqualität von Schule: Schulische und außerschulische Bedingungen mathematischer, naturwissenschaftlicher und überfachlicher Kompetenzen, pp. 176-191. Weinheim: Beltz.
Muijs, D., Kyriakides, L., van der Werf, G., Creemers, B., Timperley, H., & Earl, L. (2014). State of the art–teacher effectiveness and professional learning. School Effectiveness and School Improvement, 25(2), pp. 231-256.
Muthén, B.O., Kao, C.-F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, pp. 1-22.
Naumann, A., Hartig, J., & Hochweber, J. (2017). Absolute and relative measures of instructional sensitivity. Journal of Educational and Behavioral Statistics, 42(6), pp. 678-705.
Naumann, A., Hochweber, J., & Hartig, J. (2014). Modeling instructional sensitivity using a longitudinal multilevel differential item functioning approach. Journal of Educational Measurement, 51, pp. 381-399.
Naumann, A., Hochweber, J., & Klieme, E. (2016). A psychometric framework for the evaluation of instructional sensitivity. Educational Assessment, 21(2), pp. 89-101.
Naumann, A., Musow, S., Aichele, C., Hochweber, J., & Hartig, J. (2018). Instruktionssensitivität von Tests und Items [Instructional Sensitivity of Tests and Items]. Zeitschrift für Erziehungswissenschaft.
Pellegrino, J.W. (2002). Knowing what students know. Issues in Science & Technology, 19(2), pp. 48-52.
Pianta, R.C., & Hamre, B.K. (2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38(2), pp. 109-119.
Plummer (2003). JAGS. A program for analysis of bayesian graphical models using gibbs sampling. Proceedings of the 3rd international workshop on distributed statistical computing. Vienna.
Polikoff, M.S. (2010). Instructional sensitivity as a psychometric property of assessments. Educational Measurement: Issues and Practice, 29(4), pp. 3-14.
Polikoff, M.S. (2016). Evaluating the instructional sensitivity of four states' student achievement tests. Educational Assessment, 21(2), pp. 102-119.
Pollmeier, J., Hardy, I., Koerber, S., & Möller, K. (2011). Lassen sich naturwissenschaftliche Lernstände im Grundschulalter mit schriftlichen Aufgaben valide erfassen?. Zeitschrift für Padagogik, 57, pp. 834-853.
Popham, J.W. (2007). Instructional insensitivity of tests: Accountability's dire drawback. Phi Delta Kappan, 89(2), pp. 146-155.
Praetorius, A.-K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German framework of three basic dimensions. ZDM Mathematics Education, 50(3), pp. 407-426.
R Core Team (2017). R: A language and environment for statistical computing. Vienna, Austria. Retrieved from http://www.R-project.org.
Rakoczy, K., & Pauli, C. (2006). Hoch inferentes rating: Beurteilung der Qualität unterrichtlicher prozesse. Dokumentation der Erhebungs- und Auswertungsinstrumente zur schweizerisch-deutschen Videostudie „Unterrichtsqualität, Lernverhalten und mathematisches Verständnis“. Materialien zur Bildungsforschung, Band, Vol. 15, pp. 206-233. Frankfurt am Main: GFPF.
Raudenbush, S.W. (2009). The Brown legacy and the O'Connor challenge: Transforming schools in the images of children's potential. Educational Researcher, 38, pp. 169-180.
Ruiz-Primo, M.A., Li, M., Wills, K., Giamellaro, M., Lan, M.C., & Mason, H. (2012). Developing and evaluating instructionally sensitive assessments in science. Journal of Research in Science Teaching, 49(6), pp. 691-712.
Ruiz-Primo, M.A., Shavelson, R.J., Hamilton, L., & Klein, S. (2002). On the evaluation of systemic science education reform: Searching for instructional sensitivity. Journal of Research in Science Teaching, 39(5), pp. 369-393.
Ruzek, E.A., Hafen, C.A., Allen, J.P., Gregory, A., Mikami, A.Y., & Pianta, R.C. (2016). How teacher emotional support motivates students: The mediating roles of perceived peer relatedness, autonomy support, and competence. Learning and Instruction, 42, pp. 95-103.
Schmidt, W.H., & Maier, A. (2009). Opportunity to learn. Handbook of education policy research, pp. 541-559. New York: Routledge.
Schneider, M., & Hardy, I. (2013). Profiles of inconsistent knowledge in children's pathways of conceptual change. Developmental Psychology, 49(9), pp. 1639-1649.
Seidel, T., & Shavelson, R.J. (2007). Teaching effectiveness research in the past decade: The role of theory and research design in disentangling meta-analysis results. Review of Educational Research, 77(4), pp. 454-499.
Shavelson, R.J. (2012). On an approach to testing and modeling competencies. Modeling and measuring competencies in higher education: Tasks and challenges, pp. 29-43. Rotterdam: Sense.
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Boca Raton: Chapman & Hall/CRC.
van der Linden, W.J. (1981). A latent trait look at pretest-posttest validation of criterion-referenced test items. Review of Educational Research, 51(3), pp. 379-402.
Verhagen, J., & Fox, J.P. (2013). Bayesian tests of measurement invariance. British Journal of Mathematical and Statistical Psychology, 66(3), pp. 383-401.
Weiß, R.H. (2006). CFT 20–R. Grundintelligenztest skala 2-revision. Göttingen, Germany: Hogrefe.
Wirtz, M., & Caspar, F. (2002). Beurteilerübereinstimmung und Beurteilerreliabilität. Göttingen: Hogrefe.
Zieky, M.J. (2014). An introduction to the use of evidence-centered design in test development. Psicología Educativa, 20, pp. 79-87.

These references have been extracted automatically and may have some errors. Signed in users can suggest corrections to these mistakes.

Suggest Corrections to References

[ref-1] Adl-Amini, K., Decristan, J., Hondrich, A.L., & Hardy, I. (2014). Umsetzung von peer-gestütztem Lernen im naturwissenschaftlichen Sachunterricht der Grundschule. Zeitschrift für Grundschulforschung, 7(2), pp. 74-87.

[ref-2] AERA, APA, & NCME (2014). Standards for educational and psychological testing. Washington, DC.

[ref-3] Airasian, P.W., & Madaus, G.F. (1983). Linking testing and instruction: Policy issues. Journal of Educational Measurement, 20, pp. 103-118.

[ref-4] Arens, A.K., Morin, A.J.S., & Watermann, R. (2015). Relations between classroom disciplinary problems and student motivation: Achievement as a potential mediator?. Learning and Instruction, 39, pp. 184-193.

[ref-5] Baker, E.L. (1994). Making performance assessment work: The road ahead. Educational Leadership, 51(6), pp. 58-62.

[ref-6] Baumert, J., & Kunter, M. (2013). The effect of content knowledge and pedagogical content knowledge on instructional quality and student achievement. Cognitive activation in the mathematics classroom and professional competence of teachers: Results from the COACTIV project, pp. 175-206. New York, Heidelberg, Dordrecht, London: Springer.

[ref-7] Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., (2010). Teachers' mathematical knowledge, cognitive activation in the classroom, and student progress. American Educational Research Journal, 47(1), pp. 133-180.

[ref-8] Bill, & Melinda Gates Foundation (2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains.

[ref-9] Blömeke, S., Gustafsson, J.-E., & Shavelson, R.J. (2015). Approaches to competence measurement in higher education. Zeitschrift für Psychologie, 223(1), pp. 3-13.

[ref-10] Bos, W., Valentin, R., Horberg, S., Arnold, K.-H., Faust, G., Fried, L., (2008). Zusammenfassung und Schlussfolgerungen. IGLU-E 2006. Die Länder der Bundesrepublik Deutschland im nationalen und internationalen Vergleich, pp. 143-156. Münster: Waxmann.

[ref-11] Brühwiler, C., & Blatchford, P. (2011). Effects of class size and adaptive teaching competency on classroom processes and academic outcome. Learning and Instruction, 21, pp. 95-108.

[ref-12] Burstein, L. (1989). Conceptual considerations in instructionally sensitive assessment. Paper presented at the annual meeting of the american educational research association, san Francisco, CA.

[ref-13] Cox, R.C., & Vargas, J.S. (1966). A comparison of item-selection techniques for norm referenced and criterion referenced tests. Paper presented at the annual conference of the national council on measurement in education, chicago, IL.

[ref-14] Creemers, B.P.M., & Kyriakides, L. (2008). The dynamics of educational effectiveness: A contribution to policy, practice and theory in contemporary schools. London and New York: Routledge.

[ref-15] Danielson, C. (2007). Enhancing professional practice: A framework for teaching.

[ref-16] Decristan, J., Hondrich, A.L., Büttner, G., Hertel, S., Klieme, E., Kunter, M., (2015). Impact of additional guidance in science education on primary students' conceptual understanding. The Journal of Educational Research, 108(5), pp. 358-370.

[ref-17] Decristan, J., Klieme, E., Kunter, M., Hochweber, J., Büttner, G., Fauth, B., (2015). Embedded formative assessment and classroom process quality: How do they interact in promoting science understanding?. American Educational Research Journal, 52(6), pp. 1133-1159.

[ref-18] Deutscher, V., & Winther, E. (2018). Instructional sensitivity in vocational education. Learning and Instruction, 53, pp. 21-33.

[ref-19] ESSA (2015). Every student Succeeds Act of 2015. Every student Succeeds Act of 2015.

[ref-20] Fauth, B., Decristan, J., Rieser, S., Klieme, E., & Büttner, G. (2014). Student ratings of teaching quality in primary school: Dimensions and prediction of student outcomes. Learning and Instruction, 29, pp. 1-9.

[ref-21] Fauth, B., Decristan, J., Rieser, S., Klieme, E., & Büttner, G. (2014). Grundschulunterricht aus Schüler-, Lehrer- und Beobachterperspektive: Zusammenhänge und Vorhersage von Lernerfolg. Zeitschrift für Padagogische Psychologie, 28(3), pp. 127-137.

[ref-22] Fox, J.-P. (2010). Bayesian item response modeling: Theory and applications. New York, NY: Springer.

[ref-23] French, B.F., Finch, H.W., Randel, B., Hand, B., & Gotch, C.M. (2016). Measurement invariance techniques to enhance measurement sensitivity. International Journal of Quantitative Research in Education, 3(1–2), pp. 79-93.

[ref-24] Gasser, L., Grütter, J., Buholzer, A., & Wettstein, A. (2018). Emotionally supportive classroom interactions and students' perceptions of their teachers as caring and just. Learning and Instruction, 54, pp. 82-92.

[ref-25] Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., & Rubin, D. (2013). Bayesian data analysis.

[ref-26] Grossman, P., Cohen, J., Ronfeldt, M., & Brown, L. (2014). The test matters: The relationship between classroom observation scores and teacher value added on multiple types of assessment. Educational Researcher, 43(6), pp. 293-303.

[ref-27] Grossman, P., Loeb, S., Cohen, J., & Wyckoff, J. (2013). Measure for measure: The relationship between measures of instructional practice in middle school English language arts and teachers' value added scores. American Journal of Education, 119(3), pp. 445-470.

[ref-28] Gustafsson, J.-E., & Balke, G. (1993). General and specific abilities as predictors of school achievement. Multivariate Behavioral Research, 28(4), pp. 407-434.

[ref-29] Haladyna, T.M. (2004). Developing and validating multiple-choice test items. Mahwah, NJ: Lawrence Erlbaum Associates.

[ref-30] Haladyna, T.M., & Roid, G.H. (1981). The role of instructional sensitivity in the empirical review of criterion-referenced test items. Journal of Educational Measurement, 18, pp. 39-53.

[ref-31] Hardy, I., Jonen, A., Möller, K., & Stern, E. (2006). Effects of instructional support within constructivist learning environments for elementary school students' understanding of “floating and sinking. Journal of Educational Psychology, 98(2), pp. 307-326.

[ref-32] Hardy, I., Kleickmann, T., Koerber, S., Mayer, D., Möller, K., Pollmeier, J., (2010). Die Modellierung naturwissenschaftlicher Kompetenz im Grundschulalter. Zeitschrift für Padagogik, 56, pp. 115-125.

[ref-33] Hartig, J., Klieme, E., & Leutner, D. (2008). Assessment of competencies in educational contexts. Göttingen: Hogrefe & Huber.

[ref-34] Hondrich, A.L., Hertel, S., Adl-Amini, K., & Klieme, E. (2016). Implementing curriculum-embedded formative assessment in primary school science classrooms. Assessment in Education: Principles, Policy & Practice, 23(3), pp. 353-376.

[ref-35] Ing, M. (2018). What about the “instruction” in instructional sensitivity? Raising a validity issue in research on instructional sensitivity. Educational and Psychological Measurement, 78(4), pp. 635-652.

[ref-36] Joyce, J., Gitomer, D.H., & Iaconangelo, C.J. (2018). Classroom assignments as measures of teaching quality. Learning and Instruction, 54, pp. 48-61.

[ref-37] Kiemer, K., Gröschner, A., Pehmer, A.K., & Seidel, T. (2015). Effects of a classroom discourse intervention on teachers' practice and students' motivation to learn mathematics and science. Learning and Instruction, 35, pp. 94-103.

[ref-38] Kirschner, P.A., Sweller, J., & Clark, R.E. (2006). Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational Psychologist, 41(2), pp. 75-86.

[ref-39] Kleickmann, T., Hardy, I., Möller, K., Pollmeier, J., Tröbst, S., & Beinbrech, C. (2010). Die Modellierung naturwissenschaftlicher Kompetenz im Grundschulalter: Theoretische Konzeption und Testkonstruktion. Zeitschrift für Didaktik der Naturwissenschaften, 16, pp. 265-284.

[ref-40] Klieme, E. (2008). Systemmonitoring für den Sprachunterricht. Unterricht und Kompetenzerwerb in Deutsch und Englisch. Ergebnisse der DESI-Studie, pp. 1-10. Weinheim: Beltz.

[ref-41] Klieme, E. (2018). Unterrichtsqualität. Handbuch schulpädagogik . Münster: Waxmann.

[ref-42] Klieme, E., & Kuger, S. (2016). PISA 2015 Context Questionnaires Framework: Monitoring opportunities and outcomes, policies and practices model-ling patterns and relations, impacts and trends in education. Organisation for economic co-operation and development (OECD) (Ed.), PISA 2015 assessment and analytical framework, pp. 101-127. Paris: OECD Publishing.

[ref-43] Klieme, E., Pauli, C., & Reusser, K. (2009). The Pythagoras study: Investigating effects of teaching and learning in Swiss and German mathematics classrooms. The power of video studies in investigating teaching and learning in the classroom, pp. 137-160. Münster: Waxmann.

[ref-44] Kuger, S., Klieme, E., Lüdtke, O., Schiepe-Tiska, A., & Reiss, K. (2017). Mathematikunterricht und Schülerleistung in der Sekundarstufe: Zur Validität von Schülerbefragungen in Schulleistungsstudien. Zeitschrift für Erziehungswissenschaft, Sonderheft, 33, pp. 61-98.

[ref-45] Kunter, M., & Voss, T. (2013). The model of instructional quality in COACTIV: A multicriteria analysis. Cognitive activation in the mathematics classroom and professional competence of teachers: Results from the COACTIV project, pp. 97-124. New York, Heidelberg, Dordrecht, London: Springer.

[ref-46] Kyriakides, L., Christoforou, C., & Charalambous, C.Y. (2013). What matters for student learning outcomes: A meta-analysis of studies exploring factors of effective teaching. Teaching and Teacher Education, 36, pp. 143-152.

[ref-47] Lazonder, A.W., & Harmsen, R. (2016). Meta-analysis of inquiry-based learning: Effects of guidance. Review of Educational Research, 86(3), pp. 681-718.

[ref-48] Levy, R., & Mislevy, R.J. (2016). Bayesian psychometric modeling. Boca Raton, FL: Chapman & Hall/CRC.

[ref-49] Linn, R.L., & Harnisch, D.L. (1981). Interactions between item content and group membership on achievement test items. Journal of Educational Measurement, 18, pp. 109-118.

[ref-50] Lipowsky, F., Rakoczy, K., Pauli, C., Drollinger-Vetter, B., Klieme, E., & Reusser, K. (2009). Quality of geometry instruction and its short-term impact on students' understanding of the Pythagorean Theorem. Learning and Instruction, 19(6), pp. 527-537.

[ref-51] Li, H., Qin, Q., & Lei, P.-W. (2016). An examination of the instructional sensitivity of the TIMSS math items: A hierarchical differential item functioning approach. Educational Assessment, 22(1), pp. 1-17.

[ref-52] Maas, C.J.M., & Hox, J.J. (2005). Sufficient sample sizes for multilevel modeling. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 1(3), pp. 86-92.

[ref-53] Marsh, H.W., Lüdtke, O., Nagengast, B., Trautwein, U., Morin, A.J.S., & Abduljabbar, A.S. (2012). Classroom climate and contextual effects: Conceptual and methodological issues in the evaluation of group-level effects. Educational Psychologist, 47, pp. 106-124.

[ref-54] Marzano, R.J., & Marzano, J.S. (2003). The key to classroom management. Educational Leadership, 61(1), pp. 6-18.

[ref-55] Mashburn, A.J., Pianta, R.C., Hamre, B.K., Downer, J.T., Barbarin, O.A., Bryant, D., (2008). Measures of classroom quality in prekindergarten and children's development of academic, language, and social skills. Child Development, 79(3), pp. 732-749.

[ref-56] Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), pp. 149-174.

[ref-57] Mayer, R. (2004). Should there be a three-strikes rule against pure discovery learning? The case for guided methods of instruction. American Psychologist, 59, pp. 14-19.

[ref-58] Mehrens, W.A., & Phillips, S.E. (1986). Detecting impacts of curricular differences in achievement test data. Journal of Educational Measurement, 23(3), pp. 185-196.

[ref-59] Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), pp. 13-23.

[ref-60] Mislevy, R.J., & Haertel, G. (2006). Implications of evidence-centered design for educational assessment. Educational Measurement: Issues and Practice, 25, pp. 6-20.

[ref-61] Möller, K., Jonen, A., Hardy, I., & Stern, E. (2002). Die Förderung von naturwissenschaftlichem Verständnis bei Grundschulkindern durch Strukturierung der Lernumgebung. Bildungsqualität von Schule: Schulische und außerschulische Bedingungen mathematischer, naturwissenschaftlicher und überfachlicher Kompetenzen, pp. 176-191. Weinheim: Beltz.

[ref-62] Muijs, D., Kyriakides, L., van der Werf, G., Creemers, B., Timperley, H., & Earl, L. (2014). State of the art–teacher effectiveness and professional learning. School Effectiveness and School Improvement, 25(2), pp. 231-256.

[ref-63] Muthén, B.O., Kao, C.-F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, pp. 1-22.

[ref-64] Naumann, A., Hartig, J., & Hochweber, J. (2017). Absolute and relative measures of instructional sensitivity. Journal of Educational and Behavioral Statistics, 42(6), pp. 678-705.

[ref-65] Naumann, A., Hochweber, J., & Hartig, J. (2014). Modeling instructional sensitivity using a longitudinal multilevel differential item functioning approach. Journal of Educational Measurement, 51, pp. 381-399.

[ref-66] Naumann, A., Hochweber, J., & Klieme, E. (2016). A psychometric framework for the evaluation of instructional sensitivity. Educational Assessment, 21(2), pp. 89-101.

[ref-67] Naumann, A., Musow, S., Aichele, C., Hochweber, J., & Hartig, J. (2018). Instruktionssensitivität von Tests und Items [Instructional Sensitivity of Tests and Items]. Zeitschrift für Erziehungswissenschaft.

[ref-68] Pellegrino, J.W. (2002). Knowing what students know. Issues in Science & Technology, 19(2), pp. 48-52.

[ref-69] Pianta, R.C., & Hamre, B.K. (2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38(2), pp. 109-119.

[ref-70] Plummer (2003). JAGS. A program for analysis of bayesian graphical models using gibbs sampling. Proceedings of the 3rd international workshop on distributed statistical computing. Vienna.

[ref-71] Polikoff, M.S. (2010). Instructional sensitivity as a psychometric property of assessments. Educational Measurement: Issues and Practice, 29(4), pp. 3-14.

[ref-72] Polikoff, M.S. (2016). Evaluating the instructional sensitivity of four states' student achievement tests. Educational Assessment, 21(2), pp. 102-119.

[ref-73] Pollmeier, J., Hardy, I., Koerber, S., & Möller, K. (2011). Lassen sich naturwissenschaftliche Lernstände im Grundschulalter mit schriftlichen Aufgaben valide erfassen?. Zeitschrift für Padagogik, 57, pp. 834-853.

[ref-74] Popham, J.W. (2007). Instructional insensitivity of tests: Accountability's dire drawback. Phi Delta Kappan, 89(2), pp. 146-155.

[ref-75] Praetorius, A.-K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German framework of three basic dimensions. ZDM Mathematics Education, 50(3), pp. 407-426.

[ref-76] R Core Team (2017). R: A language and environment for statistical computing. Vienna, Austria. Retrieved from http://www.R-project.org.

[ref-77] Rakoczy, K., & Pauli, C. (2006). Hoch inferentes rating: Beurteilung der Qualität unterrichtlicher prozesse. Dokumentation der Erhebungs- und Auswertungsinstrumente zur schweizerisch-deutschen Videostudie „Unterrichtsqualität, Lernverhalten und mathematisches Verständnis“. Materialien zur Bildungsforschung, Band, Vol. 15, pp. 206-233. Frankfurt am Main: GFPF.

[ref-78] Raudenbush, S.W. (2009). The Brown legacy and the O'Connor challenge: Transforming schools in the images of children's potential. Educational Researcher, 38, pp. 169-180.

[ref-79] Ruiz-Primo, M.A., Li, M., Wills, K., Giamellaro, M., Lan, M.C., & Mason, H. (2012). Developing and evaluating instructionally sensitive assessments in science. Journal of Research in Science Teaching, 49(6), pp. 691-712.

[ref-80] Ruiz-Primo, M.A., Shavelson, R.J., Hamilton, L., & Klein, S. (2002). On the evaluation of systemic science education reform: Searching for instructional sensitivity. Journal of Research in Science Teaching, 39(5), pp. 369-393.

[ref-81] Ruzek, E.A., Hafen, C.A., Allen, J.P., Gregory, A., Mikami, A.Y., & Pianta, R.C. (2016). How teacher emotional support motivates students: The mediating roles of perceived peer relatedness, autonomy support, and competence. Learning and Instruction, 42, pp. 95-103.

[ref-82] Schmidt, W.H., & Maier, A. (2009). Opportunity to learn. Handbook of education policy research, pp. 541-559. New York: Routledge.

[ref-83] Schneider, M., & Hardy, I. (2013). Profiles of inconsistent knowledge in children's pathways of conceptual change. Developmental Psychology, 49(9), pp. 1639-1649.

[ref-84] Seidel, T., & Shavelson, R.J. (2007). Teaching effectiveness research in the past decade: The role of theory and research design in disentangling meta-analysis results. Review of Educational Research, 77(4), pp. 454-499.

[ref-85] Shavelson, R.J. (2012). On an approach to testing and modeling competencies. Modeling and measuring competencies in higher education: Tasks and challenges, pp. 29-43. Rotterdam: Sense.

[ref-86] Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Boca Raton: Chapman & Hall/CRC.

[ref-87] van der Linden, W.J. (1981). A latent trait look at pretest-posttest validation of criterion-referenced test items. Review of Educational Research, 51(3), pp. 379-402.

[ref-88] Verhagen, J., & Fox, J.P. (2013). Bayesian tests of measurement invariance. British Journal of Mathematical and Statistical Psychology, 66(3), pp. 383-401.

[ref-89] Weiß, R.H. (2006). CFT 20–R. Grundintelligenztest skala 2-revision. Göttingen, Germany: Hogrefe.

[ref-90] Wirtz, M., & Caspar, F. (2002). Beurteilerübereinstimmung und Beurteilerreliabilität. Göttingen: Hogrefe.

[ref-91] Zieky, M.J. (2014). An introduction to the use of evidence-centered design in test development. Psicología Educativa, 20, pp. 79-87.

LearnTechLib - The Learning & Technology Library

Sensitivity of test items to teaching quality ARTICLE

Alexander Naumann, DIPF, Germany ; Svenja Rieser, University of Wuppertal (BUWI), Germany ; Stephanie Musow, Jan Hochweber, University of Teacher Education St. Gallen (PHSG), Switzerland ; Johannes Hartig, DIPF, Germany

Abstract

Citation

Keywords

References

Sign in or Register

Sign in using Email & Password

1-click Sign-in

Share this Paper

Sensitivity of test items to teaching quality
ARTICLE