Cross-language speech perception and production (CLSP) is the study of how speakers produce and perceive sounds, sequences, prosody, and tone that are not found in their native language. This has been examined from more than a few perspectives, notably phonetic, theoretical/phonological, and psycholinguistic. The study of CLSP encompasses many studies that look at adult second language (L2) learners; however, what sets CLSP apart from strictly L2 acquisition is that it also includes studies of how speakers confront non-native sounds and structures even when they are not actively learning the language under study. Such research has a number of purposes, including getting a glimpse into the initial stages of L2 acquisition, gaining a better understanding of the structure of the native phonology by understanding how speakers and listeners modify structures that are not allowed by their native phonology, and probing factors that affect loanword adaptation (for more on this, see the separate Oxford Bibliographies article in Linguistics Loanwords). Because the broader issue of second language acquisition of phonetics and phonology from early stages to full attainment is not the focus of this article, there are several related topics that will not be represented here. First, studies of only simultaneous or early bilingual speakers are not included, since these speakers have been exposed to multiple languages before the critical period (though some of the studies below do use these bilinguals as a comparison to adult L2 learners). However, studies examining data from beginning adult L2 acquisition are included, since such learners still exhibit substantial interference from their native languages in the perception and production of the non-native stimuli. Second, the focus of this article is on experimental studies, primarily those from a phonetic and psycholinguistic perspective. While there is a rich tradition of research investigating predictions from theoretical phonology in L2 acquisition, most prominent theories of CLSP crucially integrate predictions from acoustic and articulatory phonetics in the tenets of the theories, and they have informed most subsequent studies of non-native speech production and perception. Nevertheless, some research that aims to assess both the contributions of higher-level phonological structure and the production or perception of phonetic detail in CLSP is included. In this article, research on segmental units such as vowels and consonants, phonotactics, prosody, and tone are presented, as well as studies that represent particularly notable methodological paradigms, such as perceptual illusions and high variability training.

Theories and Models of Cross-Language Speech Perception and Production

The development of models of cross-language speech perception and production has been shaped by the disparate goals of the authors proposing them. Flege 1995 and Flege 2003 intend the Speech Learning Model (SLM) to be a theory of second language acquisition after the critical period, and the author’s research often focuses on how factors such as age of acquisition or length of residence affect speakers’ ability to perceive and produce sounds in their second language. Despite the focus on L2 acquisition, many subsequent researchers have applied the SLM to cross-language research more generally. Moreover, Flege’s body of work notably addresses both production and perception, whereas most of the other works in this section focus mostly on perception. In contrast to Flege, the Perceptual Assimilation Model (PAM) proposed in Best 1995 is borne out of the tenets of direct realism, which postulates that the basic perceptual primitives are distal articulatory gestures. PAM more explicitly addresses the issue of cross-language perception, and not longer-term L2 acquisition, though the PAM was later extended to account for some aspects of L2 perception in Best and Tyler 2007. The Automatic Selective Perception (ASP) model proposed by Strange 2011 also focuses on perception but has a greater focus on the role of attention and memory than other cross-language speech perception models. Another unique aspect of Strange’s model is that it incorporates the influence of particular task demands to account for listener behavior. The NLM and NLM-e developed by Kuhl and Iverson 1995 and Kuhl, et al. 2008 differ from other perception models by incorporating insights from both L1 and L2 acquisition research. The NLM-e, in particular, focuses on the shift that occurs around one year of age, when the ability to fully perceive all possible sounds gradually recedes, attributing this change to advancing neural commitment to the native language. Finally, Van Leussen and Escudero 2015 proposes the Second Language Linguistic Perception (L2LP) model to incorporate a critical role for lexical learning in assisting listeners to acquire new sounds that otherwise would be difficult to distinguish from existing L1 sounds in many cases. At this point in the development of theories of cross-language speech perception (or processing more generally), much of the focus is on individual consonants and vowels, or on the contrast between sounds in an inventory, but existing models will ultimately need to provide a more complete account of phonotactics, prosody, and tone as well.

  • Best, Catherine. 1995. A direct-realist view of cross-language perception. In Speech perception and linguistic experience: Issues in cross-language research. Edited by Winifred Strange, 171–204. Baltimore: York.

    The first part of this chapter on the Perceptual Assimilation Model (PAM) explains how cross-language speech perception can be derived from direct realism. The second half motivates the patterns of assimilation for non-native segments, such as whether they are assimilated to a native category, are an uncategorizable speech sound, or are not perceived as speech.

  • Best, Catherine, and Michael Tyler. 2007. Nonnative and second-language speech perception: Commonalities and complementarities. In Second language speech learning: The role of language experience in speech perception and production. Edited by Murray Munro and Ocke-Schwen Bohn, 13–34. Amsterdam: John Benjamins.

    This paper extends Best’s 1995 PAM model to account for how the perception of non-native sounds may proceed as learners get more experienced with L2. The expanded version, called PAM-L2, addresses how the formation of categories for L2 sounds interacts with existing L1 categories.

  • Flege, James. 1995. Second-language speech learning: Theory, findings, and problems. In Speech perception and linguistic experience: Issues in cross-language research. Edited by Winifred Strange, 229–273. Timonium, MD: York.

    This chapter develops detailed hypotheses for Flege’s Speech Learning Model (SLM), which aims to account for age-related effects on language learners’ ability to produce L2 vowels and consonants. The SLM hypothesizes that L2 sounds that are more similar to native sounds will be harder to learn than different sounds, since they will assimilate to native categories.

  • Flege, James. 2003. Assessing constraints on second-language segmental production and perception. In Phonetics and phonology in language comprehension and production: Differences and similarities. Edited by Antje Meyer and Niels Schiller, 319–355. Berlin: Mouton de Gruyter.

    Updates Flege 1995 and evaluates some predictions from the earlier paper in light of many studies on both perception and production, especially in vowel acquisition, that appeared between 1995 and 2003. Flege concludes that the claims of the SLM are mostly upheld, but he points out where there are limitations.

  • Kuhl, Patricia, Barbara T. Conboy, Sharon Coffey-Corina, Denise Padden, Maritza Rivera-Gaxiola, and Tobey Nelson. 2008. Phonetic learning as a pathway to language: new data and native language magnet theory expanded (NLM-e). Philosophical Transactions of the Royal Society B 363:979–1000.

    DOI: 10.1098/rstb.2007.2154

    While primarily a model of development and behavior in infants, the NLM-e pertains to cross-language perception because it holds that early phonetic learning leads to a decline in neural flexibility. It is hypothesized that when neural stability is reached, it becomes difficult to shift the distribution of phonetic categories in response to new input.

  • Kuhl, Patricia, and P. Iverson. 1995. Linguistic experience and the perceptual magnet effect. In Speech perception and linguistic experience: Theoretical and methodological issues in cross-language speech research. Edited by Winifred Strange, 121–154. Baltimore: York.

    This chapter presents the Native Language Magnet (NLM) model, focusing on how the perception of L2 sounds is related to the process of phoneme category formation in L1 acquisition. The NLM holds that L1 acquisition leads to the development of perceptual magnets, which cause close non-native sounds to be perceived as part of the already formed categories.

  • Strange, Winifred. 2011. Automatic selective perception (ASP) of first and second language speech: A working model. Journal of Phonetics 39.4: 456–466.

    This approach to cross-language speech perception focuses on “selective perception routines,” which refers to highly practiced behaviors used to perceive native sounds. L2 perception does not benefit from these routines, so listeners must apply greater attentional resources to process non-native sounds. Another focus is on what task demands reveal about phonetic and phonological modes of perception.

  • Van Leussen, Jan-Willem, and Paola Escudero. 2015. Learning to perceive and recognize a second language: The L2LP model revised. Frontiers in Psychology 6.

    This model starts with the premise that L2 learning takes an optimized L1 system as its starting point and adds the idea that lexical acquisition is crucial for enabling L2 phoneme perception. Hypothesizes that an interaction between the speech signal and the lexicon is likely an important component of L2 phonetic learning.

