Linguistics Spoken Word Recognition
Cynthia S. Q. Siew
  • LAST MODIFIED: 11 January 2024
  • DOI: 10.1093/obo/9780199772810-0317


The core question that spoken word recognition research attempts to address is: How does a phonological word-form activate the corresponding lexical representation that is stored in the mental lexicon? While speech perception research (see the separate Oxford Bibliographies in Linguistics article “Speech Perception”) focuses on the mapping of highly variable acoustic signal onto more abstract phonological units, spoken word recognition focuses on the mapping of phonological information onto lexical and semantic representations—the repository of linguistic knowledge stored in a “mental dictionary” or the mental lexicon (see the separate Oxford Bibliographies in Linguistics article “Mental Lexicon”). Earlier theoretical work considers the three following stages as being fundamental to spoken word recognition. First, there is activation of multiple word forms that share some phonological similarity to the auditory input. Second, there is a selection stage whereby activated word forms compete with each other for recognition. Finally, when a single lexical candidate remains, its meaning is accessed and is then integrated with higher levels of processing (e.g., with sentential or discourse information). Although these stages of spoken word recognition are presented as being part of a serial process, it is important to note that current theoretical and empirical work in the field emphasize the highly parallel, incremental, and continuous nature of spoken word recognition—even though theories of spoken word recognition continue to differ greatly in their description and conceptualization of these “stages,” and in the computational implementation of competition and lexical selection mechanisms. The temporal, fleeting nature of acoustic input creates unique theoretical and empirical challenges for the field, for instance, the challenge of word segmentation in continuous speech and for embedded words, which has traditionally progressed at a more gradual pace relative to research in visual word recognition (see the separate Oxford Bibliographies in Linguistics article “Visual Word Recognition”). Nevertheless, in the almost sixty years of its history, spoken word recognition research has led to the discovery of a number of lexical-semantic and contextual factors that influence the speed and accuracy of spoken word recognition. Lexical-semantic factors refer to the lexical and semantic properties of individual words, for instance, its frequency of occurrence in the language or its extent of phonological similarity to other words in the language. Contextual factors refer to how characteristics of the talker and listener, as well as environmental features or noise, can create suboptimal conditions for spoken word recognition. In addition, the robust top-down influences of lexical knowledge on sublexical representations highlight how the integration of top-down information and bottom-up perceptual input forms a crucial feature of models of spoken word recognition. These empirical findings provide important constraints on the development of models and theories that attempt to explain the cognitive mechanisms that support the retrieval of spoken words from the lexicon.

General Overviews

This section highlights overview articles that provide diverse and useful perspectives on spoken word recognition research. Dahan and Magnuson 2006 provides an in-depth discussion of important theoretical issues in spoken word recognition research. Vitevitch, et al. 2018 provides a historical overview of theoretical approaches to spoken word recognition. Magnuson and Crinnion 2022 discusses theoretical and computational challenges that still exist for spoken word recognition research today. Other reviews focus on a number of complementary issues in spoken word recognition. Newman, et al. 2012 emphasizes neurological aspects and related issues in spoken word recognition. Luce and McLennan 2005 focuses on the issue of variation of the input in spoken word recognition. Finally, McQueen 2012 is a popular overview article that provides an accessible introduction to the key research questions that still prevail in spoken word recognition research today.

  • Dahan, Delphine, and James S. Magnuson. 2006. Spoken word recognition. In Handbook of psycholinguistics. By Matthew J. Traxler and Morton A. Gernsbacher, 249–283. Elsevier.

    DOI: 10.1016/B978-012369374-7/50009-2

    Provides an in-depth explanation and discussion of core concepts of activation, competition, and integration that represent central functions in theories of spoken word recognition. Discusses important empirical results which are integrated with a detailed discussion of theoretical and computational models of spoken word recognition.

  • Luce, Paul A., and Conor T. McLennan. 2005. Spoken word recognition: The challenge of variation. In The handbook of speech perception. Edited by David B. Pisoni and Robert E. Remez, 590–609. Oxford: Blackwell

    DOI: 10.1002/9780470757024.ch24

    Although this overview is situated in a broader edited volume on speech perception, the authors discuss the challenges that models of spoken word recognition face in relation to two sources of variation in the input: indexical and allophonic.

  • Magnuson, J. S., and A. M. Crinnion. 2022. Spoken word recognition. In The Oxford handbook of the mental lexicon. Edited by A. Papafragou, J. C. Trueswell, and L. R. Gleitman, 461–490. Oxford: Oxford Univ. Press.

    DOI: 10.1093/oxfordhb/9780198845003.013.23

    Discusses theoretical and computational challenges that remain despite the simplifying assumptions of spoken word recognition (i.e., the assumption of an abstract input representation that represents the speech signal). Authors argue for the need for spoken word recognition research to integrate constraints from speech perception and higher levels of language processing.

  • McQueen, James M. 2012. Eight questions about spoken word recognition. In The Oxford handbook of psycholinguistics. Edited by M. Gareth Gaskell, 37–54. Oxford Univ. Press.

    DOI: 10.1093/oxfordhb/9780198568971.013.0003

    Provides an overview of current empirical evidence on spoken word recognition organized around the listener’s goal of extracting lexical information from a spoken utterance, framed in a series of questions that provides an engaging introduction to the field.

  • Newman, Randy L., Kelly Forbes, and John F. Connolly. 2012. Event-related potentials and magnetic fields associated with spoken word recognition. In The Cambridge handbook of psycholinguistics. Edited by Michael Spivey, Ken McRae, and Marc Joanisse, 127–156. New York: Cambridge Univ. Press.

    DOI: 10.1017/CBO9781139029377.008

    Although this bibliography entry’s primary focus is on the theoretical and empirical issues that dominate spoken word recognition, those who are interested in the neurological aspects of spoken word recognition will find this review of event-related potential research in spoken word recognition a useful starting point.

  • Vitevitch, Michael S., Cynthia S. Q. Siew, and Nichol Castro. 2018. Spoken word recognition. In The Oxford handbook of psycholinguistics. Edited by Shirley-Ann Rueschemeyer and M. Gareth Gaskell, 30–47. Oxford: Oxford Univ. Press.

    DOI: 10.1093/oxfordhb/9780198786825.013.2

    Provides a historical overview of theoretical approaches to spoken word recognition and highlights recent innovative approaches such as Bayesian modeling, Network Science, and Discriminative Learning that are currently being pursued in spoken word recognition research.

