Document Type

Conference Proceeding

Publication Date


Publication Title

2021 ASEE Annual Conference


This research paper describes the development of an assessment instrument for use with middle school students that provides insight into students’ interpretive understanding by looking at early indicators of developing expertise in students’ responses to solution generation, reflection, and concept demonstration tasks.

We begin by detailing a synthetic assessment model that served as the theoretical basis for assessing specific thinking skills. We then describe our process of developing test items by working with a Teacher Design Team (TDT) of instructors in our partner school system to set guidelines that would better orient the assessment in that context and working within the framework of standards and disciplinary core ideas enumerated in the Next Generation Science Standards (NGSS). We next specify our process of refining the assessment from 17 items across three separate item pools to a final total of three open-response items. We then provide evidence for the validity and reliability of the assessment instrument from the standards of (1) content, (2) meaningfulness, (3) generalizability, and (4) instructional sensitivity.

As part of the discussion from the standards of generalizability and instructional sensitivity, we detail a study carried out in our partner school system in the fall of 2019. The instrument was administered to students in treatment (n= 201) and non-treatment (n = 246) groups, wherein the former participated in a two-to-three-week, NGSS-aligned experimental instructional unit introducing the principles of engineering design that focused on engaging students using the Imaginative Education teaching approach. The latter group were taught using the district’s existing engineering design curriculum.

Results from statistical analysis of student responses showed that the interrater reliability of the scoring procedures were good-to-excellent, with intra-class correlation coefficients ranging between .72 and .95. To gauge the instructional sensitivity of the assessment instrument, a series of non-parametric comparative analyses (independent two-group Mann-Whitney tests) were carried out. These found statistically significant differences between treatment and non-treatment student responses related to the outcomes of fluency and elaboration, but not reflection.




© American Society for Engineering Education, 2021


Archived as published. Open access paper.

Included in

Engineering Commons



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.