Education Measurement in Education in the United States
Gregory Cizek, Charlotte Agger
  • LAST REVIEWED: 12 October 2015
  • LAST MODIFIED: 25 February 2016
  • DOI: 10.1093/obo/9780199756810-0060


Educational measurement is the science and practice of obtaining information about characteristics of students, such as their knowledge, skills, abilities, and interests. It is a specialty within the broader discipline of psychometrics. Measurement in education includes the development of instruments or protocols for obtaining information, procedures for analyzing and evaluating the quality of the information gained from the use of instruments or protocols, and strategies for communicating the resulting information to diverse audiences, such as educators, policymakers, parents, and students. All measurement in education has the common aims of (1) arriving at defensible conclusions regarding students’ standing with respect to a specified educational outcome, (2) documenting student ability, achievement, or interests, (3) gauging student progress toward specified educational goals, and (4) improving teaching and learning. Educational measurement is closely related to the concepts of testing, assessment, and evaluation. Measurement can be defined as the process of assigning numbers to events based on an established set of rules. In educational measurement, the “events” under consideration are students’ test performances. In the simplest case, the numerals assigned are typically whole numbers, such as a student’s number of correct responses. An example of a set of “rules” in this situation might be that one point is earned for each correct response to a multiple-choice test item, zero points are earned for an incorrect response, and the sum of these values is the student’s total test score (sometimes called a raw score). Percentage correct is another commonly used metric, although a variety of transformations and other scales may be used. Testing uses measurement to support inferences about students’ knowledge, skills, or abilities. A test, broadly conceived, is any systematic sample of behavior obtained under controlled conditions. In educational measurement, the behaviors of interest encompass a wide range of outcomes, including, for example, solving mathematics problems, expressing degrees of interest in various occupations, reading for information, giving a speech, building a birdhouse, writing an essay, and so on. Because it is not possible to observe all of what a student knows or can do, educational measurement is based on samples of these behaviors. Consequently, when a test is administered, inference is always required because it is typically not the student’s performance on the specific math problems or on the specific essay that is of interest; those behaviors are considered to be merely representations of the student’s underlying problem-solving skill or writing ability. Inference is an informed conclusion, based on the sample of behavior gained during testing, about the student’s more fundamental level of knowledge or skill in mathematics or writing. In educational testing, the conditions of testing are controlled so that inferences about differences in students’ knowledge, skill, or ability can be confidently attributed to the underlying characteristics being measured and not to variation in testing conditions. Evaluation is ascribing value, merit, or worth to the information collected via measurement or testing. In educational contexts, the most common form of evaluation is grading, where a value label (such as “Pass,” “Fail,” or “A”) is used to connote information about the merit of the student’s performance. Finally, assessment refers to the process of gathering and synthesizing information from multiple sources—some or all of which may be tests—for the purposes of discovering and documenting students’ strengths and weaknesses, planning and enhancing instruction, or evaluating and making decisions about students.

General Overviews

The works in this section provide broad, introductory treatments of measurement in education. These introductory works can be grouped into three general categories. The first category includes textbooks that provide applied overviews of educational measurement. The second category comprises introductions to the theoretical, psychometric, and statistical models and methods that underlie educational and psychological measurement. The third category includes critiques of testing in education.

