• Introduction
• General Overviews
• The Importance of Reliability
• Reliability Point Estimation
• Reliability Interval Estimation
• Traditional Reliability Estimation Procedures with Limited General Applicability
• Coefficient Alpha
• The Relationship of Coefficient Alpha to Reliability
• Strong Convergence of the Reliability and Coefficient Alpha Estimators Correspondingly to the Population Alpha and Scale Reliability Values
• Scale Revision for Enhancing Reliability
• Reliability and Population Heterogeneity
• Standardized Reliability

# Reliability–Contemporary Psychometric ConceptionsbyTenko RaykovLAST MODIFIED: 26 May 2023DOI: 10.1093/obo/9780199828340-0314

## Introduction

Reliability is a major index of quality of behavioral measurement. Informally, reliability of a behavior measuring device—whether an item, question, scale, inventory, self-report, or test—reflects the repeatability of the results obtainable with it. The reliability index—often referred to as an index of reliability and defined as the (positive) square root of the reliability coefficient—is substantially less often used than the reliability coefficient in the behavioral sciences. This index represents the extent to which an observation, considered as a random variable, correlates with the associated true value it aims to evaluate. The reliability coefficient, which is at present nearly always employed in theoretical and empirical measurement-related discussions and treatments, can be thought of as the degree to which observed individual differences (of the units of analysis) are the result of true underlying individual differences.

## General Overviews

Based on the classical test theory (CTT) decomposition X = T + E for an observed score X, associated true score T, and pertinent error score E, the reliability coefficient is defined as the ratio Var(T)/Var(X) where Var(.) denotes variance (in a relevant population). That is, the reliability coefficient is defined whenever there are observed individual differences, i.e., Var(X) > 0 holds. This can be assumed to be the case in most if not all contemporary empirical behavioral studies—see McDonald 1999 and Zimmerman 1975. Reliability is not defined when Var(X) = 0, which is unlikely in the overwhelming majority of contemporary behavioral studies unless measuring instruments are used that are highly insensitive to individual differences—see Raykov and Marcoulides 2011. An equivalent definition of the reliability coefficient is the following: This coefficient is the squared correlation of true with observed score, that is Corr2(T,X) where Corr(.,.) denotes correlation, whenever observed and error variance are positive—see Crocker and Algina 2006 and Lord and Novick 1968. If either of these latter two variances vanishes, the reliability index is not defined, but the reliability coefficient is defined as 0 if Var(T) = 0 and Var(X) > 0 holds—see Allen and Yen 1979. Due to limited uses of the reliability index in the behavioral and social science literature, unless otherwise indicated the remainder of this discussion utilizes the reference “reliability” as synonymous to the reliability coefficient. This reference will be typically used for multi-component measuring instruments (psychometric scales), which will be mostly of relevance in the sequel. Further, all measuring instruments are assumed consisting of components that are pre-fixed rather than selected randomly or otherwise from pre-existing larger pools of such measures, to which inferences are sought.

• Allen, M. J., and W. M. Yen. 1979. Introduction to measurement theory. Long Grove, IL: Waveland Press.

Provides a comprehensive discussion of the basics of behavioral measurement and its applications, including detailed coverage of reliability at a relatively introductory level.

• Crocker, L., and J. Algina. 2006. Introduction to classical and modern test theory. Fort Worth, TX: Harcourt College Publishers.

Offers a rigorous discussion of modern and classical approaches to behavioral measurement and attends in detail to reliability-related matters.

• Lord, F. M., and M. Novick. 1968. Statistical theories of mental test scores. Reading, MA: Wesley.

A thorough treatment of the theoretical framework of mental testing and pertinent statistical theories, including the foundations of CTT, within which framework reliability is readily definable as the true to observed variances ratio.

• McDonald, R. P. 1999. Test theory: A unified treatment. Mahwah, NJ: Erlbaum.

Discusses reliability within the context of a unified methodology for behavioral measurement, which connects item response theory modeling-based approaches with those grounded in factor analysis.

• Raykov, T., and G. A. Marcoulides. 2011. Introduction to psychometric theory. New York: Taylor & Francis.

A largely introductory treatment of psychometric theory, which includes interpretations of the reliability coefficient and discussions of its relationships to conceptual regression models of true and observed scores.

• Zimmerman, D. W. 1975. Probability spaces, Hilbert spaces, and the axioms of test theory. Psychometrika 40:395–412.

Provides the arguably most rigorous treatment of CTT, based on the concepts of Hilbert space, projection, and conditional expectation, and offers a very general definition of the reliability coefficient within that formal approach.