Reliability in Educational Assessments
- LAST REVIEWED: 30 October 2019
- LAST MODIFIED: 30 October 2019
- DOI: 10.1093/obo/9780199756810-0228
- LAST REVIEWED: 30 October 2019
- LAST MODIFIED: 30 October 2019
- DOI: 10.1093/obo/9780199756810-0228
Introduction
There are three foundations identified by professional standards for examining the psychometric quality of assessments: validity, reliability, and fairness. Thus, reliability is a primary concern for all assessments. Reliability is defined as the consistency of scores across replications. In education, the sources of measurement error and the basis for replications include items, forms, raters, or occasions. The source of the measurement error will determine the type of reliability and ultimately the generalizations about the measurement. Thus, inconsistency in scores is potentially due to multiple sources of random error, and this definition can be applied to multiple types of replications depending on the generalization that is to be made (e.g., items, forms, raters, or occasions). There are also multiple indices for reporting reliability, including reliability coefficients, generalizability coefficients, standard errors of measurement, and information functions, to name a few. The indices are defined differently with different test theories. For example, classical test theory emphasizes reliability coefficients and standard errors of measurement; item response theory emphasizes information functions; generalizability theory emphasizes generalizability coefficients, dependability indices, and relative and absolute standard errors; and classification consistency emphasizes proportion agreement unadjusted or adjusted for chance agreement. The importance of reliability varies depending on the uses made of the assessment. Reliability is considered to be increasingly important when the consequences of test use are more high stakes. Thus, reliability is expected to be more rigorously adhered to when tests are used to make high-stakes decisions about individuals, such as employment or certification decisions and decisions about clinical placement. While validity, or the interpretations and uses of test scores, is considered the most important characteristic of a test, reliability provides a strong foundation for validity, providing a necessary condition for most test uses or interpretations. When scores are not consistent within a testing procedure, the scores are considered to be influenced instead by random errors of measurement. Thus, the scores will not have strong relationships to other variables, will not have strong internal structure, and will not accurately reflect score uses and interpretations that are necessary for validity. Consequently, reliability is often considered necessary to the valid use and interpretations of scores. On the other hand, the test could have high reliability and still not be valid for a particular use or interpretation, since validity would be dependent on measuring consistently and measuring the right construct.
General Overviews of Reliability
The literature on psychometrics, reliability, and testing has many summaries of reliability and related issues. Broad overviews can be found in numerous writings, including refereed journals and book chapters. One of the important broad overview works in test theory is the professional standards for educational and psychological testing, which were developed under the joint leadership of three professional organizations (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education). The fourth edition of the professional standards is American Educational Research Association, et al. 2014. The standards focus on multiple technical properties of assessments but identify validity, reliability, and fairness in the three foundational chapters. Historical changes in assessment and the methods and theories for assessing reliability can be seen through the three prior editions of the professional standards (1974, 1985, 1999). Several overviews are available that include technical, detailed treatments of reliability. Brennan 2001 provides an overview of reliability that emphasizes the historical development of reliability and includes a discussion of multiple ways of defining reliability that range from an emphasis on errors of measurement to replication of the assessment. Detailed and comprehensive overviews of reliability can also be found in Feldt and Brennan 1989 and Haertel 2006. In contrast to the technical overviews, a simple overview of reliability and validity for classroom teachers, who are one of the primary users of assessments, in presented in Miller, et al. 2013.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. 2014. Standards for educational and psychological testing. 4th ed. Washington, DC: American Educational Research Association.
Professional organizations in education and psychology provide the 4th edition of professional standards on testing. The reliability chapter includes definitions and standards for using reliability in assessment practices, which represent a consensus view of professionals in testing. The eight clusters of standards are specifications for replication of the testing procedure, evaluating reliability/precision, reliability/generalizability coefficients, factors affecting reliability/precision, standard errors of measurement, decision consistency, reliability/precision of group means, and documenting reliability/precision.
Brennan, R. L. 2001. An essay on the history and future of reliability from the perspective of replications. Journal of Educational Measurement 38:295–317.
DOI: 10.1111/j.1745-3984.2001.tb01129.x
Brennan provides a historical review of reliability as well as future areas of development and use. He focuses on multiple definitions and emphases in reliability, such as those currently used in the standards emphasizing replication (consistency) as opposed to an emphasis on errors of measurement (inconsistency).
Feldt, L. S., and R. L. Brennan. 1989. Reliability. In Educational measurement. 3d ed. Edited by R. L. Linn, 105–146. New York: American Council on Education.
Felt and Brennan provide a detailed overview of reliability and the statistical assumptions necessary for development of coefficients. This provides an emphasis on context-based errors that lead to different estimates of reliability and different interpretations. The chapter does not include item response theory, since it is treated in another chapter of the book.
Haertel, E. H. 2006. Reliability. In Educational measurement. 4th ed. Edited by R. L. Brennan, 65–110. Westport, CT: Praeger.
Haertel provides a comprehensive and detailed review of reliability based on multiple test theories. The chapter includes advanced statistical treatment of the approaches to test theories, data collection designs, and statistical indices used to estimate reliability using classical test theory, generalizability theory, and classification consistency. Also suggests future directions for reliability. The chapter does not include item response theory, since it is treated in another chapter of the book.
Miller, M. D., R. L. Linn, and N. E. Gronlund. 2013. Measurement and assessment in teaching. 11th ed. Boston: Pearson.
Chapter 5 provides a simple overview of reliability for the lay reader or beginning undergraduate who will be a user of testing, particularly pre-service teachers. Reliability and validity (chapter 4) are presented in the context of educators that use and develop tests for classroom assessment but do not require an awareness of the complexities of the statistical methods.
Users without a subscription are not able to see the full content on this page. Please subscribe or login.
How to Subscribe
Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here.
Article
- Academic Achievement
- Academic Audit for Universities
- Academic Freedom and Tenure in the United States
- Action Research in Education
- Adjuncts in Higher Education in the United States
- Administrator Preparation
- Adolescence
- Advanced Placement and International Baccalaureate Courses
- Advocacy and Activism in Early Childhood
- African American Racial Identity and Learning
- Alaska Native Education
- Alternative Certification Programs for Educators
- Alternative Schools
- American Indian Education
- Animals in Environmental Education
- Art Education
- Artificial Intelligence and Learning
- Assessing School Leader Effectiveness
- Assessment, Behavioral
- Assessment, Educational
- Assessment in Early Childhood Education
- Assistive Technology
- Augmented Reality in Education
- Beginning-Teacher Induction
- Bilingual Education and Bilingualism
- Black Undergraduate Women: Critical Race and Gender Perspe...
- Black Women in Academia
- Blended Learning
- Bullying
- Case Study in Education Research
- Changing Professional and Academic Identities
- Character Education
- Children’s and Young Adult Literature
- Children's Beliefs about Intelligence
- Children's Rights in Early Childhood Education
- Citizenship Education
- Civic and Social Engagement of Higher Education
- Classroom Learning Environments: Assessing and Investigati...
- Classroom Management
- Coherent Instructional Systems at the School and School Sy...
- College Admissions in the United States
- College Athletics in the United States
- Community Relations
- Comparative Education
- Computer-Assisted Language Learning
- Computer-Based Testing
- Conceptualizing, Measuring, and Evaluating Improvement Net...
- Continuous Improvement and "High Leverage" Educational Pro...
- Counseling in Schools
- Creativity
- Critical Approaches to Gender in Higher Education
- Critical Perspectives on Educational Innovation and Improv...
- Critical Race Theory
- Crossborder and Transnational Higher Education
- Cross-National Research on Continuous Improvement
- Cross-Sector Research on Continuous Learning and Improveme...
- Cultural Diversity in Early Childhood Education
- Culturally Responsive Leadership
- Culturally Responsive Pedagogies
- Culturally Responsive Teacher Education in the United Stat...
- Curriculum Design
- Data Collection in Educational Research
- Data-driven Decision Making in the United States
- Deaf Education
- Desegregation and Integration
- Design Thinking and the Learning Sciences: Theoretical, Pr...
- Development, Moral
- Dialogic Pedagogy
- Digital Age Teacher, The
- Digital Citizenship
- Digital Divides
- Disabilities
- Distance Learning
- Distributed Leadership
- Doctoral Education and Training
- Early Childhood Education and Care (ECEC) in Denmark
- Early Childhood Education and Development in Mexico
- Early Childhood Education in Aotearoa New Zealand
- Early Childhood Education in Australia
- Early Childhood Education in China
- Early Childhood Education in Europe
- Early Childhood Education in Sub-Saharan Africa
- Early Childhood Education in Sweden
- Early Childhood Education Pedagogy
- Early Childhood Education Policy
- Early Childhood Education, The Arts in
- Early Childhood Mathematics
- Early Childhood Science
- Early Childhood Teacher Education
- Early Childhood Teachers in Aotearoa New Zealand
- Early Years Professionalism and Professionalization Polici...
- Economics of Education
- Education For Children with Autism
- Education for Sustainable Development
- Education Leadership, Empirical Perspectives in
- Education of Native Hawaiian Students
- Education Reform and School Change
- Educational Research Approaches: A Comparison
- Educational Statistics for Longitudinal Research
- Educator Partnerships with Parents and Families with a Foc...
- Emotional and Affective Issues in Environmental and Sustai...
- Emotional and Behavioral Disorders
- English as an International Language for Academic Publishi...
- Environmental and Science Education: Overlaps and Issues
- Environmental Education
- Environmental Education in Brazil
- Epistemic Beliefs
- Equity and Improvement: Engaging Communities in Educationa...
- Equity, Ethnicity, Diversity, and Excellence in Education
- Ethical Research with Young Children
- Ethics and Education
- Ethics of Teaching
- Ethnic Studies
- Evidence-Based Communication Assessment and Intervention
- Family and Community Partnerships in Education
- Family Day Care
- Federal Government Programs and Issues
- Feminization of Labor in Academia
- Finance, Education
- Financial Aid
- Formative Assessment
- Future-Focused Education
- Gender and Achievement
- Gender and Alternative Education
- Gender, Power and Politics in the Academy
- Gender-Based Violence on University Campuses
- Gifted Education
- Global Mindedness and Global Citizenship Education
- Global University Rankings
- Governance, Education
- Grounded Theory
- Growth of Effective Mental Health Services in Schools in t...
- Higher Education and Globalization
- Higher Education and the Developing World
- Higher Education Faculty Characteristics and Trends in the...
- Higher Education Finance
- Higher Education Governance
- Higher Education Graduate Outcomes and Destinations
- Higher Education in Africa
- Higher Education in China
- Higher Education in Latin America
- Higher Education in the United States, Historical Evolutio...
- Higher Education, International Issues in
- Higher Education Management
- Higher Education Policy
- Higher Education Research
- Higher Education Student Assessment
- High-stakes Testing
- History of Early Childhood Education in the United States
- History of Education in the United States
- History of Technology Integration in Education
- Homeschooling
- Inclusion in Early Childhood: Difference, Disability, and ...
- Inclusive Education
- Indigenous Education in a Global Context
- Indigenous Learning Environments
- Indigenous Students in Higher Education in the United Stat...
- Infant and Toddler Pedagogy
- Inservice Teacher Education
- Integrating Art across the Curriculum
- Intelligence
- Intensive Interventions for Children and Adolescents with ...
- International Perspectives on Academic Freedom
- Intersectionality and Education
- Knowledge Development in Early Childhood
- Leadership Development, Coaching and Feedback for
- Leadership in Early Childhood Education
- Leadership Training with an Emphasis on the United States
- Learning Analytics in Higher Education
- Learning Difficulties
- Learning, Lifelong
- Learning, Multimedia
- Learning Strategies
- Legal Matters and Education Law
- LGBT Youth in Schools
- Linguistic Diversity
- Linguistically Inclusive Pedagogy
- Literacy
- Literacy Development and Language Acquisition
- Literature Reviews
- Mathematics Identity
- Mathematics Instruction and Interventions for Students wit...
- Mathematics Teacher Education
- Measurement for Improvement in Education
- Measurement in Education in the United States
- Meta-Analysis and Research Synthesis in Education
- Methodological Approaches for Impact Evaluation in Educati...
- Methodologies for Conducting Education Research
- Mindfulness, Learning, and Education
- Mixed Methods Research
- Motherscholars
- Motivation
- Multiliteracies in Early Childhood Education
- Multiple Documents Literacy: Theory, Research, and Applica...
- Multivariate Research Methodology
- Museums, Education, and Curriculum
- Music Education
- Narrative Research in Education
- Native American Studies
- Nonformal and Informal Environmental Education
- Note-Taking
- Numeracy Education
- One-to-One Technology in the K-12 Classroom
- Online Education
- Open Education
- Organizing for Continuous Improvement in Education
- Organizing Schools for the Inclusion of Students with Disa...
- Outdoor Play and Learning
- Outdoor Play and Learning in Early Childhood Education
- Pedagogical Leadership
- Pedagogy of Teacher Education, A
- Performance Objectives and Measurement
- Performance-based Research Assessment in Higher Education
- Performance-based Research Funding
- Phenomenology in Educational Research
- Philosophy of Education
- Physical Education
- Play
- Podcasts in Education
- Policy
- Policy Context of United States Educational Innovation and...
- Politics of Education
- Portable Technology Use in Special Education Programs and ...
- Post-humanism and Environmental Education
- Pre-Service Teacher Education
- Problem Solving
- Productivity and Higher Education
- Professional Development
- Professional Learning Communities
- Program Evaluation
- Programs and Services for Students with Emotional or Behav...
- Psychology Learning and Teaching
- Psychometric Issues in the Assessment of English Language ...
- Qualitative Data Analysis Techniques
- Qualitative, Quantitative, and Mixed Methods Research Samp...
- Qualitative Research Design
- Quantitative Research Designs in Educational Research
- Queering the English Language Arts (ELA) Writing Classroom
- Race and Affirmative Action in Higher Education
- Reading Education
- Refugee and New Immigrant Learners
- Relational and Developmental Trauma and Schools
- Relational Pedagogies in Early Childhood Education
- Reliability in Educational Assessments
- Religion in Elementary and Secondary Education in the Unit...
- Researcher Development and Skills Training within the Cont...
- Research-Practice Partnerships in Education within the Uni...
- Response to Intervention
- Restorative Practices
- Risky Play in Early Childhood Education
- Role of Gender Equity Work on University Campuses through ...
- Scale and Sustainability of Education Innovation and Impro...
- Scaling Up Research-based Educational Practices
- School Accreditation
- School Choice
- School Culture
- School District Budgeting and Financial Management in the ...
- School Improvement through Inclusive Education
- School Reform
- Schools, Private and Independent
- School-Wide Positive Behavior Support
- Science Education
- Secondary to Postsecondary Transition Issues
- Self-Regulated Learning
- Self-Study of Teacher Education Practices
- Service-Learning
- Severe Disabilities
- Single Salary Schedule
- Single-sex Education
- Single-Subject Research Design
- Social Context of Education
- Social Justice
- Social Network Analysis
- Social Pedagogy
- Social Science and Education Research
- Social Studies Education
- Sociology of Education
- Standards-Based Education
- Statistical Assumptions
- Student Access, Equity, and Diversity in Higher Education
- Student Assignment Policy
- Student Engagement in Tertiary Education
- Student Learning, Development, Engagement, and Motivation ...
- Student Participation
- Student Voice in Teacher Development
- Sustainability Education in Early Childhood Education
- Sustainability in Early Childhood Education
- Sustainability in Higher Education
- Teacher Beliefs and Epistemologies
- Teacher Collaboration in School Improvement
- Teacher Evaluation and Teacher Effectiveness
- Teacher Preparation
- Teacher Training and Development
- Teacher Unions and Associations
- Teacher-Student Relationships
- Teaching Critical Thinking
- Technologies, Teaching, and Learning in Higher Education
- Technology Education in Early Childhood
- Technology, Educational
- Technology-based Assessment
- The Bologna Process
- The Regulation of Standards in Higher Education
- Theories of Educational Leadership
- Three Conceptions of Literacy: Media, Narrative, and Gamin...
- Tracking and Detracking
- Traditions of Quality Improvement in Education
- Transformative Learning
- Transitions in Early Childhood Education
- Tribally Controlled Colleges and Universities in the Unite...
- Understanding the Psycho-Social Dimensions of Schools and ...
- University Faculty Roles and Responsibilities in the Unite...
- Using Ethnography in Educational Research
- Value of Higher Education for Students and Other Stakehold...
- Virtual Learning Environments
- Vocational and Technical Education
- Wellness and Well-Being in Education
- Women's and Gender Studies
- Young Children and Spirituality
- Young Children's Learning Dispositions
- Young Children's Working Theories