Valid and Reliable Science Content Assessments for Science Teachers

Mar. 01, 2013

Source: Journal of Science Teacher Education, Vol. 24, Issue 2, March 2013, p. 269-295.
The purpose of this article is to offer the validity and reliability evidence for teacher science content assessments developed as part of the Diagnostic Teacher Assessments of Mathematics and Science (DTAMS) project.

A total of three separate assessments, which focused on physical science, life science, and earth/space science, have been developed.

The article is organized around two major phases in developing and establishing the validity and reliability of these assessments:
the process of assessment development which included strategies designed to strengthen the validity and reliability of score interpretations, and
empirical results from approximately 4,400 teachers.


This discussion summarizes validity and reliability arguments and outlines potential uses of these assessments.

Validity Evidence from Assessment Development Process
Validity was strengthened by systematic synthesis of relevant documents, extensive use of external reviewers, and field tests with 900 teachers during assessment development process.
These assessments were designed to strategically sample across both a depth of knowledge and a breadth of knowledge dimension.

The subsequent results from 4,400 teachers, analyzed with Rasch IRT modeling techniques, offer construct and concurrent validity evidence.

Potential Uses of DTAMS Science Assessments
These science assessments offer evidence of both validity of potential score interpretations and reliability of those scores from multiple sources of evidence.
Valid and reliable assessments of teacher science content knowledge provide access to direct measurement of a crucial variable of interest to educational researchers, professional development providers, and science teacher educators.
These assessments could be used to determine the impact of workshops, courses, or other experiences on teachers’ knowledge.

These assessments are designed around content knowledge needed to teach middle school science, but are not limited to a depth of knowledge appropriate for middle school students. Thus, using these assessments with middle school students is inappropriate since they are not likely to yield useful information for that population.

These assessments are best suited for measuring impacts of programs that intend to broadly improve middle school teachers’ science knowledge in one of the three content domains.
This implies a best match for valid assessment in programs that include significant, sustained efforts.
These assessments may be used in a comparative manner.

