Jump to Content Jump to Main Navigation

Linguistics Connectionism
by
Ping Li, Xiaowei Zhao

Introduction

Connectionism, also known as parallel distributed processing (PDP) or artificial neural networks, has been an important theoretical framework as well as a computational tool for the study of mind and behavior. It adopts the perspective that human cognition is an emergent property that is due to the interaction of a large number of interconnected processing units (neurons) that operate simultaneously in a network (thus “parallel”). In addition, connectionism advocates that learning, representation, and processing of information are dynamic and distributed across the network. Language as a hallmark of human behavior has received in-depth treatment since the beginning of connectionist research. The acquisition of morphosyntax, the recognition of speech, and the processing of sentences are among the earliest connectionist models. The application of connectionism to second language acquisition has also gathered momentum in the late 20th and early 21st centuries. Learning a language entails complex cognitive and linguistic constraints and interactions, and connectionist models provide insights into how these constraints and interactions may be realized in the natural learning context.

Introductory Works

Many books that introduce the principles of connectionism have appeared since the mid-1980s. Introductions to fundamental algorithms of neural networks can be found in Haykin 1999. There have also been several contributions to readers in linguistics or psychology. The original parallel distributed processing (PDP) volume Rumelhart, et al. 1986 (cited under Reference Works) provides a comprehensive overview of the early models, many of which deal with language and cognition. Spitzer 1999 is an excellent text that provides a clear description of basic theories of connectionism and many of its applications in psychology, linguistics, and neuroscience. This book has very little technical detail and is written for the nonspecialist. Ellis and Humphreys 1999 contains more technical details and in-depth discussions along with selected readings of original articles that emphasize learning, language, and memory. Levine 2000 focuses on general organizing principles underlying neural and cognitive modeling, including competition, association, and categorization. For readers interested in learning how to implement a connectionist network step-by-step, there are a few books and tools at varying levels of difficulty that provide good exercises and examples of phenomena from linguistics, psychology, and neuroscience (see Software). O’Reilly and Munakata 2000 provides a comprehensive discussion of computational cognitive neuroscience that, unlike other texts mentioned in this section, focuses on modeling the neuronal system rather than the cognitive system at an abstract level. This book is written for more advanced readers, perhaps at the advanced graduate student level. There are also a few books that are more focused on applying connectionist approaches to specific topics or domains of study—for example, connectionism and development or connectionist language processing. These include Elman, et al. 1996, which highlights the connectionist framework of learning, interaction, and emergence and provides good illustrations of how connectionism tackles issues in cognitive and language development. Another book of this type is Shultz 2003, focusing more on stages of development and mechanisms of transition. Christiansen and Chater 2001 has useful reviews of connectionism in language studies.

  • Christiansen, Morten, and Nick Chater, eds. 2001. Connectionist psycholinguistics. Westport, CT: Ablex.

    Save Citation »Export Citation »E-mail Citation »

    A collection of papers by connectionist scholars that reviews (up to 2001) connectionist models of language, including spoken word recognition, morphosyntax, language production, reading and dyslexia, and language acquisition.

    Find this resource:

  • Ellis, Rob, and Glyn W. Humphreys. 1999. Connectionist psychology: A text with readings. New York: Psychology Press.

    Save Citation »Export Citation »E-mail Citation »

    A generally readable textbook with a focus on applying connectionist models to the study of cognitive phenomena, including memory, language, learning, and cognitive disorders.

    Find this resource:

  • Elman, Jeffrey, Elizabeth A. Bates, Mark H. Johnson, Annette Karmiloff-Smith, Domenico Parisi, and Kim Plunkett. 1996. Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    This book has been called the “second bible” of connectionism; it focuses on cognitive and language development based on the connectionism framework. It argues for the need to clearly define innateness at different levels and to separate innateness from modularity, domain specificity, and localization.

    Find this resource:

  • Haykin, Simon. 1999. Neural networks: A comprehensive foundation. 2d ed. Upper Saddle River, NJ: Prentice Hall.

    Save Citation »Export Citation »E-mail Citation »

    A comprehensive textbook introducing many neural network algorithms, though not written specifically for readers in psychology or linguistics.

    Find this resource:

  • Levine, Daniel. 2000. Introduction to neural and cognitive modeling. 2d ed. Mahwah, NJ: Lawrence Erlbaum.

    Save Citation »Export Citation »E-mail Citation »

    This book focuses on general organizing principles underlying neural and cognitive modeling, including competition, association, and categorization. It contains many more technical and mathematical details than do other books discussed here.

    Find this resource:

  • O’Reilly, Randall C., and Yuko Munakata. 2000. Computational explorations in cognitive neuroscience: Understanding the mind by simulating the brain. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    A comprehensive book covering cognitive and neural modeling with the aim of uniting computation, brain, and cognition for a field called computational cognitive neuroscience. The book is more suitable for readers at the graduate student level or higher. Readers who are interested in this text should explore the wiki site provided by the authors.

    Find this resource:

  • Shultz, Thomas. 2003. Computational developmental psychology. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    An excellent discussion of using neural networks to study cognitive development, especially stages of development and mechanisms of transition. The book provides a good neural network primer along with mathematical basics of connectionist principles. It also discusses the cascade-correlation model that the author uses, a neural network that can dynamically recruit new, hidden units in response to task demands.

    Find this resource:

  • Spitzer, Manfred. 1999. The mind within the net: Models of learning, thinking, and acting. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    An easy-to-read text that provides a discussion of neural networks and cognition in layperson’s terms.

    Find this resource:

Reference Works

There are good web-based resources about connectionism and its related concepts. Garson 2010 and Waskan 2010 both provide concise introductions of connectionism from a cognitive science or philosophy perspective without giving too many technical details (e.g., mathematical equations). A comprehensive reference resource is Arbib 2003. The two volumes on parallel distributed processing (PDP), Rumelhart, et al. 1986; McClelland, et al. 1986, though somewhat outdated on specific topics, have been referred to as the “bible” of the field and outline many important principles underlying connectionism in its original form. However, beginning students may find it difficult to understand or apply many of the technical details in these two volumes and should instead read McClelland 2011, which provides an excellent introduction and step-by-step exercises to various connectionist networks, including models that were not originally covered in the PDP volumes, such as self-organizing maps (SOM).

Bibliographies

Selfridge, et al. 1988 selects and provides annotations to some important works in connectionism ranging from the 1940s to the 1980s. A good and partially annotated bibliography that focuses on the philosophy of connectionism is Chalmers and Bourget 2007–2009. McClelland 2011 is useful for researchers interested in the application of connectionism models in linguistics and psychology. This short bibliography contains only the classic reference items for connectionism. A large collection of several, though not annotated, bibliographies on neural networks can be found in Bibliographies on Neural Networks.

Software

As a computational tool for the study of the human mind, connectionism has attracted researchers from many other domains of study, including computer science and physics. Proficient modelers typically develop their own programs to model the phenomena of their interest, and in some cases they share these programs with the public. There are also software packages available to researchers who are interested in modeling but who do not have much computational expertise. For example, Plunkett and Elman 1997 includes a simulation software called Tlearn along with simple exercises for researchers. Tlearn allows researchers to build multilayered feedforward networks, including the simple recurrent network. Tlearn is now outdated and has been replaced by OXlearn (see Ruh and Westermann 2009) and runs on MatLab. A more comprehensive and research-oriented neural network simulator is Emergent, a successor to the software PDP++, which accompanies O’Reilly and Munakata 2000. PDPTool is a MatLab-based neural network simulator that serves as the accompanying teaching tool for the new version of the parallel distributed processing (PDP) handbook by James L. McClelland (McClelland 2011, cited under Reference Works). For readers who are interested in using the self-organizing map algorithm, the SOM Toolbox for MatLab 5 (Vesanto, et al. 2000) would be a good start. In addition, jTRACE provides a Java-based implementation of the TRACE model of spoken word recognition. Although not a neural network simulator, the Child Language Data Exchange System (CHILDES) (MacWhinney 2000) is a very useful platform for large-scale connectionist modeling of child language development. CHILDES contains a large corpus of child-child and child-adult speech interactions that many empirical and computational researchers have relied on (see MacWhinney 2010, cited under Connectionist Language Models).

Journals

Because of the interdisciplinary nature of connectionism, there are many journals accepting submissions related to it. Although many of them are interested in neural network theories and applications, they tend to focus on particular perspectives, from engineering to biology. Two journals, Connection Science and Neural Networks, are devoted entirely to connectionism and frequently have articles discussing linguistic, psychological, and cognitive aspects of connectionism. Cognitive Science is a journal of the Cognitive Science Society that covers a wide range of topics in the scientific study of mind, but it publishes many computational articles, including connectionist models of language and cognition. Trends in Cognitive Science has also been connectionism friendly in the past, offering perspectives and discussions related to neural networks.

History

The basic idea of connectionism can be traced back to McCulloch and Pitts 1943, which introduced a highly simplified mathematical model of biological neurons (units), in which the authors suggested that several such units working in concert are able to compute simple logical functions such as AND or OR by turning on and off units conjunctively or disjunctively. Rosenblatt 1958 further implemented the idea by introducing the “perceptron” model, epitomizing the first wave of connectionism. Frank Rosenblatt’s models were criticized later in Minsky and Papert 1969, which (incorrectly) concluded that connectionism was moving to a dead end since it could not solve nonlinear logical functions, such as the “exclusive or” (XOR) problem. This argument seriously dampened researchers’ interests, leading to a dearth of work in connectionism in the 1970s. Only a few researchers continued connectionist work; for example, Stephen Grossberg (Grossberg 1976) piloted the unsupervised learning neural network model called adaptive resonance theory (ART). A sudden resurgence of connectionism was seen in the early 1980s, when breakthroughs were made in various connectionist architectures. For instance, John J. Hopfield (Hopfield 1982) infused concepts from statistical physics into the design of his model, starting a new trend in analyzing neural networks as dynamic systems. Rumelhart, et al. 1986 reformulates the algorithm of “back propagation,” which allows multilayered feedforward neural networks to effectively solve nonlinear problems. In terms of connectionist models of language, the Rumelhart and McClelland 1986 model of learning the English past tense spurred intense debates as well as tremendous interest in the utility of connectionist models to language, and the McClelland and Elman 1986 TRACE model (cited under Models of Language Processing) implemented the first large-scale interactive activation model of speech perception. In the early 21st century, connectionism has become a powerful tool as well as a conceptual framework for understanding many important issues in linguistics. A good tour of the history of connectionism is Medler 1998.

  • Grossberg, Stephen. 1976. Adaptive pattern classification and universal recoding: I, Parallel development and coding of neural feature detectors; II: Feedback, expectation, olfaction, and illusions. Biological Cybernetics 23.3: 121–134.

    DOI: 10.1007/BF00344744Save Citation »Export Citation »E-mail Citation »

    Continued in “Adaptive Pattern Classification and Universal Recoding: II: Feedback, Expectation, Olfaction, and Illusions,” Biological Cybernetics 23.4: 187–202. As one of a handful of scientists who continued working on connectionism in the 1970s, Grossberg introduced a series of unsupervised learning models called adaptive resonance theory (ART).

    Find this resource:

  • Hopfield, John J. 1982. Neural networks and physical systems with emergent collective computation abilities. Proceedings of the National Academy of Sciences 79:2554–2558.

    DOI: 10.1073/pnas.79.8.2554Save Citation »Export Citation »E-mail Citation »

    This paper is a good illustration of the interdisciplinary nature of connectionism. A physicist, Hopfield introduced the concept of “energy function” (which was borrowed from statistical physics) into the design of a network to solve classic problems, such as the traveling salesman problem (TSP).

    Find this resource:

  • McCulloch, Warren S., and Walter H. Pitts. 1943. A logical calculus of ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5:115–133.

    DOI: 10.1007/BF02478259Save Citation »Export Citation »E-mail Citation »

    This early paper introduced a highly simplified mathematical model of biological neurons, now called “McCulloch-Pitts units,” which can conjunctively or disjunctively function with a simple threshold to simulate logical functions.

    Find this resource:

  • Medler, David A. 1998. A brief history of connectionism. Neural Computing Surveys 1:18–72.

    Save Citation »Export Citation »E-mail Citation »

    An introduction to the development of connectionism until the late 1990s.

    Find this resource:

  • Minsky, Marvin, and Seymour Papert. 1969. Perceptrons: An introduction to computational geometry. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    Ironically, this book with “perceptron” in its title destroyed the reputation of perceptrons, given its argument that perceptrons are unable to solve nonlinearly separable problems. This work negatively affected the development of connectionism in the 1970s.

    Find this resource:

  • Rosenblatt, Frank. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65:386–408.

    DOI: 10.1037/h0042519Save Citation »Export Citation »E-mail Citation »

    Rosenblatt’s work on “perceptrons” triggered the first wave of interest in connectionism.

    Find this resource:

  • Rumelhart, David, G. Hinton, and R. Williams. 1986. Learning internal representations by error propagation. In Parallel distributed processing: Explorations in the microstructures of cognition. Vol. 1, Foundations. Edited by David E. Rumelhart, James L. McClelland, and the PDP Research Group, 318–362. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    The backpropagation algorithm was reformulated with the generalized delta rule and was demonstrated for its powerful function in solving nonlinear problems and forming internal representations in multilayered feedforward networks.

    Find this resource:

  • Rumelhart, David, and James L. McClelland. 1986. On learning the past tenses of English verbs. In Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 2, Psychological and biological models. Edited by James L. McClelland, David E. Rumelhart, and the PDP Research Group, 216–271. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    A feedforward autoassociative network in this classic piece shows that a single mechanism in connectionist models can produce the U-shaped developmental pattern in the acquisition of the English past tenses.

    Find this resource:

Debates

Voices against connectionism have been raised since its resurgence in the 1980s. Fodor and Pylyshyn 1988 and Pinker and Prince 1988 were among the arguments against connectionism as a model of human learning and cognition, pointing out the limitations of early parallel distributed processing (PDP) models. Crick 1989 argues that the famous “backpropagation” learning algorithm is implausible as a biological principle (that is, in biological systems there are no feedback loops for error correction each time a mismatch is detected). Many of these limitations were subsequently addressed by more sophisticated connectionist models. Marcus 2001 argues against what the author calls “eliminative connectionism” by demonstrating the failure of a few neural networks that he himself built. In 2002 a “past tense debate” was organized by Trends in Cognitive Science, which published important summary and position papers, including Pinker and Ullman 2002 and McClelland and Patterson 2002. More recently, Behavioral and Brain Sciences published open peer commentaries on connectionist approaches to conceptual and semantic representation and processing, featuring Rogers and McClelland 2008, a précis of semantic cognition.

Neural Basis

Discussion of the neural basis of connectionism can be found in many introductory books (see Introductory Works). The development of connectionism was largely motivated by the idea that the human brain does not operate like a digital computer. The human brain consists of a huge network of nerve cells, with billions of interconnected neurons and trillions of connections between these neurons. At the individual neuronal level, the electrochemical processes for information transmission are relatively simple, going from cell bodies to axons and passing through a synaptic cleft to reach the dendrites of another neuron, involving action potential propagation and neurotransmitter trafficking along the chain. It is at the neuronal network level that information processing becomes more interesting and also more complicated, given that a single neuron is usually connected to thousands of other neurons. The strengths of their synaptic connections (the effectiveness of signal transmission across synapses) are not fixed, and it is the changes that occur in synaptic strength that determine neuronal teamwork. The ability of the human brain to derive the “optimal” strengths for a neuronal network in solving any given problem is the basis of neural information processing, which has inspired connectionist theories of learning, memory, and language. Each individual neuron is not very powerful, but a simultaneously activated network of neurons makes human cognition possible. A very influential, neutrally inspired hypothesis about how the brain supports associative learning and memory is Hebbian learning, according to which neurons that fire together wire together, as Hebb 1949 suggests. Hebbian learning is a neurally plausible mechanism related to long-term potential (LTP) in biological systems, and it can be mathematically expressed as Δwkl = αk. αl, where Δwkl refers to change of weight from input k to l and αk and αl the associated activations, which indicates that the connection strengths between neurons k and l will be increased as a function of their concurrent activities. A good overview of Hebbian learning and its use in connectionist models is Munakata and Pfaffly 2004.

Network Structure and Learning Algorithms

Connectionist models are built on two fundamental components: simple processing elements (units, nodes, or artificial neurons) and the connections among these processing elements following certain structural constraints (hence the term “connectionism”). Like real neurons, a node receives input from other nodes. The input signals are accumulated and further transformed via a mathematical function (e.g., a sigmoid function) to determine the activation value of the node. A given connectionist network can have varying numbers of nodes, with activations spreading from node to node via the corresponding connections. Like real synapses, the connections can have different levels of strength (weights), which can often be adjusted according to learning algorithms, thereby modulating the amount of influence that the source node can have on the target node. In this way, the network can develop unique combinations of weights and activation patterns of nodes in representing (and thus memorizing) different input patterns from the learning environment. The weights and activation patterns in connectionist networks are often adapted continuously during learning, and it is this adaptive process that makes connectionist networks interesting models of human behavior.

Feedforward Network and Backpropagation

To build connectionist models, one needs to select the architecture of the network and determine what learning algorithms to use. A popular connectionist architecture used in linguistic research is a network with information feeding forward through multiple layers of nodes (usually three layers corresponding to input, hidden, and output layers). Feedforward networks often use a learning algorithm called “backpropagation,” as discussed in Rumelhart, et al. 1986. According to backpropagation, each time the network learns an input-to-output mapping, the discrepancy (or error, δ) between the actual output (produced by the network based on the current connection weights) and the desired output (provided by the researcher) is calculated and is propagated back to the network so that the relevant connection weights can be changed relative to the amount of error (using the generalized delta rule, i.e., ▵ω = η*δ, where ▵ω indicates change of weight, η the rate of learning, and δ the error). Continuous weight adjustments in this way lead the network to fine-tune its connection weights in response to regularities in the input-output relationships. At the end of learning, the network derives a set of weight values that allows it to take on any pattern in the input and produce the desired pattern in the output. A good description of the general principles of backpropagation is Arbib 2003. Some practical techniques for improving the algorithm can also be found in chapter 6 of Duda, et al. 2001. “Backpropagation” belongs to a category of algorithms called “supervised learning”; a good practical book on supervised learning in neural networks is Reed and Marks 1999.

Self-Organizing Map and Unsupervised Learning

In contrast to supervised learning, there is another group of algorithms for training neural networks called “unsupervised learning.” Specifically, unsupervised learning uses no explicit error signal to adjust the weights. A good introduction to basic unsupervised learning algorithms is in Duda, et al. 2001 (see specifically chapter 10). An edited volume on unsupervised learning in neural networks is Hinton and Sejnowski 1999. One popular class of unsupervised learning is the self-organizing map (or SOM, first introduced in Kohonen 1982), which is a network consisting of a topographic map (usually two-dimensional) for the organization of (usually high-dimensional) input representations, where each node on the map is self-adjusted to have weights that resemble the input weights so that they can maximally represent the input patterns as learning progresses. Readers interested in learning more about the algorithm and applications of SOM can consult Kohonen 2001.

Other Network Structures

There are many other types of neural networks, often developed to solve different problems. Elman 1990 proposes the simple recurrent network (SRN), which has been widely applied in connectionist language studies. In SRN a context layer is added to the typical three-layered feedforward network, which provides a recurrent link between the input and the hidden layers, functioning as a memory buffer for sequential input (e.g., temporal sequences of words in a sentence). Some connectionist models have adjustable network structures or involve dynamic unit growth, such as the cascade-correction network in Shultz 2003 (cited under Introductory Works) or the constructivist neural network of Ruh and Westermann 2009. Several networks with adjustable structures are also based on unsupervised learning, such as the growing neural gas model discussed in Fritzke 1995 and the growing hierarchical self-organizing map (SOM) discussed in Rauber, et al. 2002.

  • Elman, Jeffrey. 1990. Finding structure in time. Cognitive Science 14:179–211.

    DOI: 10.1207/s15516709cog1402_1Save Citation »Export Citation »E-mail Citation »

    One of the most cited papers ever published in Cognitive Science, it discusses how connectionist models such as SRN can be used to capture linguistic structures (e.g., lexical categories like nouns, verbs, and adjectives) as language unfolds in time.

    Find this resource:

  • Fritzke, Bernd. 1995. A growing neural gas network learns topologies. In Advances in neural information processing systems 7: Proceedings of the 1994 conference. Edited by Gerald Tesauro, David S. Touretzky, and Todd K. Leen, 625–632. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    A growing neural gas network can gradually increase the number of its units during learning. A new unit is inserted into a region near the unit that has the highest error.

    Find this resource:

  • Rauber, Andreas, Dieter Merkl, and M. Dittenbach. 2002. The growing hierarchical self-organizing map: Exploratory analysis of high-dimensional data. IEEE Transactions on Neural Networks 13:1331–1341.

    DOI: 10.1109/TNN.2002.804221Save Citation »Export Citation »E-mail Citation »

    This model has a hierarchical structure that initially consists of several small SOMs that can then grow new maps deeper into the lower levels of the hierarchy as learning progresses.

    Find this resource:

  • Ruh, Nicolas, and Gert Westermann. 2009. Simulating German verb inflection with a constructivist neural network. In Connectionist models of behavior and cognition. Vol. 2. Edited by Julien Mayor, Nicolas Ruh, and Kim Plunkett, 313–324. London: World Scientific.

    Save Citation »Export Citation »E-mail Citation »

    The network’s internal representations (i.e., the hidden units) are allowed to dynamically increase or decrease as a result of learning experience or task difficulty. This procedure is similar to the cascade-correlation method, according to which the number of hidden units grows in response to task demands.

    Find this resource:

Major Properties

Connectionism stands in stark contrast to the classic symbolic view of the human mind, which assumes that human cognitive operations have properties fundamentally similar to digital computers that rely on discrete (symbol-based), modular (domain-specific), and serial (one-step-at-a-time) processes. A short and clear description of the symbolic approach to the human mind is Simon 1993. Bates and Elman 1993 provides a good review of the fundamental differences between these two approaches. First, multiple nodes in connectionist models can process information in “parallel,” giving rise to very powerful computational capacities in problem solving. Second, in terms of knowledge representation, connectionism argues for “distributed representation.” Hinton, et al. 1986 gives a clear description of this idea. Third, connectionist theories argue for the “emergence” of human cognition based on a high degree of interactivity within and between various levels of information processing, levels that stand as encapsulated modular subsystems in classic cognitive theories. An excellent introduction to emergentism is Holland 1998. MacWhinney 1999 contains articles on emergentist approaches to language and language acquisition, and MacWhinney 1998 provides a good summary of this view. Connectionist models can be considered close allies of the broader class of dynamic systems capable of capturing nonlinear patterns often observed in children and adults during learning and development. Spencer, et al. 2009 discusses some developments with this view. A position paper, McClelland, et al. 2010, discusses the importance of connectionist and dynamic systems in accounting for cognitive and computational mechanisms that give rise to cognition.

Connectionist Language Models

Since the resurgence of connectionism in the 1980s, many models have been developed to account for a wide range of empirical phenomena in language acquisition, comprehension, and production. General reviews of connectionist models of language can be found in several papers. For example, a short introduction to the connectionist approach to language is Smolensky 1999. Dell, et al. 1999 provides an excellent review of connectionist models of language production. Elman 2001 reviews some models of language acquisition. Westermann, et al. 2009 provides an up-to-date review of several connectionist learning models in language acquisition. The most recent collection is a special issue of the Journal of Child Language (MacWhinney 2010).

Representing Linguistic Features in Models

Connectionist language researchers have been concerned with how to actually represent various linguistic aspects in their models. A crude way to represent lexical items of language is to use the so-called localist representation, in which each concept or word corresponds to a single unitary processing unit in the network and the value of the unit can be determined by some arbitrary process. In contrast, connectionist models generally embrace distributed representations (see the discussion under Major Properties). With respect to phonological representation, most connectionist models have been based on representations of articulatory features of phonemes in a word—for example, the Wickelfeatures representation used in the pioneering work of Rumelhart and McClelland 1986 (cited under History) on the past-tense model. Developments in the field favor the approach that codes a word’s pronunciation on a slot-based representation. Li and MacWhinney 2002 reviews different types of representation and introduces a phonological pattern generator (PatPho) based on syllabic templates that has been successfully extended to other languages, such as Mandarin (Zhao and Li 2009). Methods to derive distributed semantic representations of words can be roughly classified into two groups. One is the feature-based representation, in which empirical data are often used to help generate the features describing the meaning of words, as in McRae, et al. 2005. The other is the corpus-based representation, which derives the meanings of words through co-occurrence statistics in large-scale linguistic corpora. Hyperspace analogue to language (HAL; see Burgess and Lund 1997) and latent semantic analysis (LSA; see Landauer and Dumais 1997) are two widely used corpus-based semantic representation methods. Zhao, et al. 2011 develops the Contextual Self-Organizing Map Package, software that can derive corpus-based semantic representations based on word co-occurrences in multiple languages. Connectionist models of reading often need representations of the orthographic information of words. In alphabetic orthographies, such as that of English, orthographic information can be represented in a way similar to phonological representations, as in the triangle model of English word reading in Plaut, et al. 1996. But in nonalphabetic orthographies, such as that of Chinese, faithful representation of orthographic information has been a big challenge. Xing, et al. 2004 introduces a structure- and stroke-based system for representing the orthographic system of most Chinese characters.

  • Burgess, Curt, and Kevin Lund. 1997. Modelling parsing constraints with high-dimensional context space. Language and Cognitive Processes 12:1–34.

    Save Citation »Export Citation »E-mail Citation »

    Hyperspace analogue to language (HAL) is a corpus-based semantic representation based on word-word co-occurrences in large text corpora.

    Find this resource:

  • Landauer, Thomas K., and Susan T. Dumais. 1997. A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104:211–240.

    DOI: 10.1037/0033-295X.104.2.211Save Citation »Export Citation »E-mail Citation »

    Latent semantic analysis (LSA) is a widely used corpus-based semantic representation based on word-passage co-occurrences in large text corpora. A passage, for example, could be an entry in an encyclopedia providing the context of use for a word.

    Find this resource:

  • Li, Ping, and Brian MacWhinney. 2002. PatPho: A phonological pattern generator for neural networks. Behavior Research Methods, Instruments, and Computers 34:408–415.

    DOI: 10.3758/BF03195469Save Citation »Export Citation »E-mail Citation »

    This paper reviews different types of phonological representations and also presents the software PatPho, which can generate phonological representations based on syllabic templates.

    Find this resource:

  • McRae, Ken, George S. Cree, Mark S. Seidenberg, and Chris McNorgan. 2005. Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods 37:547–559.

    DOI: 10.3758/BF03192726Save Citation »Export Citation »E-mail Citation »

    The authors present normed data from human participants who were instructed to generate features associated with 541 concrete English nouns. More than 2,500 dimensions based on the generated features were constructed in the dataset.

    Find this resource:

  • Plaut, David C., James L. McClelland, Mark S. Seidenberg, and Karalyn Patterson. 1996. Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review 103:56–115.

    DOI: 10.1037/0033-295X.103.1.56Save Citation »Export Citation »E-mail Citation »

    This paper demonstrates the importance of using well-structured orthographic and phonological representations in simulating normal and impaired word-reading processes.

    Find this resource:

  • Xing, Hongbing, Hua Shu, and Ping Li. 2004. The acquisition of Chinese characters: Corpus analyses and connectionist simulations. Journal of Cognitive Science 5:1–49.

    Save Citation »Export Citation »E-mail Citation »

    This paper aims at simulating the acquisition of Chinese reading by elementary school children, and the authors provide a comprehensive system for representing the orthographic information of Chinese characters.

    Find this resource:

  • Zhao, Xiaowei, and Ping Li. 2009. An online database of phonological representations for Mandarin Chinese. Behavior Research Methods 41:575–583.

    DOI: 10.3758/BRM.41.2.575Save Citation »Export Citation »E-mail Citation »

    Describes an extension of PatPho to Mandarin Chinese. The database provides the phonological representations of all Mandarin phonemes in Chinese characters. The link for this database is PatPho for Chinese.

    Find this resource:

  • Zhao, Xiaowei, Ping Li, and Teuvo Kohonen. 2011. Contextual self-organizing map: Software for constructing semantic representations. Behavior Research Methods 43:77–88.

    DOI: 10.3758/s13428-010-0042-zSave Citation »Export Citation »E-mail Citation »

    The authors describe a computer program for automatically deriving corpus-based semantic representations, which can be used for multiple languages with minor modifications.

    Find this resource:

Models of Language Processing

Many connectionist models have been applied to account for language processing. McClelland and Rumelhart 1981 is an interactive activation (IA) model of visual word perception that accounts for several empirical findings, such as the word superiority effect. Extending features of the IA model to speech, McClelland and Elman 1986 develops the TRACE model of speech perception. Another early model, NETtalk, described in Sejnowski and Rosenberg 1987, was the first attempt to train a network to read words in sentences, and it showed impressive performance. Since these early models, a large number of connectionist models have emerged to simulate both normal and impaired reading of English words. For example, Seidenberg and McClelland 1989 introduces the triangle model of word reading. Plaut, et al. 1996 demonstrates the utility of the connectionist model in simulating both normal and impaired reading processes. Harm and Seidenberg 2004 also demonstrates the importance of both visual and phonological processing in reading and comprehending printed words. Connectionist models of reading have been extended to other languages, including Chinese, as in Yang, et al. 2009 and Xing, et al. 2004 (cited under Representing Linguistic Features in Models). Miikkulainen 1997 uses the DISLEX model to simulate dyslexic and category-specific aphasic processes. There are many other connectionist models of language processing, and a good review is Rohde and Plaut 2003.

  • Harm, Michael W., and Mark S. Seidenberg. 2004. Computing the meanings of words in reading: Cooperative division of labor between visual and phonological processes. Psychological Review 111:662–720.

    DOI: 10.1037/0033-295X.111.3.662Save Citation »Export Citation »E-mail Citation »

    The authors’ simulations demonstrate that processing of printed words involves both visual and phonological processes.

    Find this resource:

  • McClelland, James L., and Jeffrey L. Elman. 1986. The TRACE model of speech perception. Cognitive Psychology 18:1–86.

    DOI: 10.1016/0010-0285(86)90015-0Save Citation »Export Citation »E-mail Citation »

    TRACE was the first large-scale interactive activation model of speech perception that provided a framework for investigating the dynamic interactions across levels of acoustic features, phonemes, and words as auditory signals unfold in the linguistic context.

    Find this resource:

  • McClelland, James L., and David E. Rumelhart. 1981. An interactive activation model of context effects in letter perception: Part 1, An account of basic findings. Psychological Review 88:375–407.

    DOI: 10.1037/0033-295X.88.5.375Save Citation »Export Citation »E-mail Citation »

    A now classic model, the IA model predated but contained many crucial features of the parallel distributed processing (PDP) connectionist models. The model simulated the bottom-up and top-down interactions across levels of visual features, letters, and words. But this early model lacked connectionist learning mechanisms.

    Find this resource:

  • Miikkulainen, Risto. 1997. Dyslexic and category-specific aphasic impairments in a self-organizing feature map model of the lexicon. Brain and Language 59:334–366.

    DOI: 10.1006/brln.1997.1820Save Citation »Export Citation »E-mail Citation »

    The author introduced DISLEX, part of the DISCERN system, in this paper. DISLEX was successfully used to simulate certain impaired language processes, such as dyslexia and aphasia.

    Find this resource:

  • Plaut, David C., James L. McClelland, Mark S. Seidenberg, and Karalyn Patterson. 1996. Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review 103:56–115.

    DOI: 10.1037/0033-295X.103.1.56Save Citation »Export Citation »E-mail Citation »

    This work demonstrates that some limitations of the original triangle model can be overcome by using better-structured orthographic and phonological representations, suggesting that connectionist models provide powerful tools for understanding normal and impaired reading processes.

    Find this resource:

  • Rohde, Douglas L. T., and David C. Plaut. 2003. Connectionist models of language processing. Cognitive Studies 10:10–28.

    Save Citation »Export Citation »E-mail Citation »

    A quick review of connectionist research in language processing.

    Find this resource:

  • Seidenberg, Mark S., and James L. McClelland. 1989. A distributed, developmental model of word recognition and naming. Psychological Review 96:523–568.

    DOI: 10.1037/0033-295X.96.4.523Save Citation »Export Citation »E-mail Citation »

    The triangle model was introduced in this paper, which later served as the basis of many connectionist reading models. In the triangle model, the orthography of words is mapped to both the phonology and the meaning of words (hence the triangle) via layers of hidden units.

    Find this resource:

  • Sejnowski, Terrence J., and Charles R. Rosenberg. 1987. Parallel networks that learn to pronounce English text. Complex Systems 1:145–168.

    Save Citation »Export Citation »E-mail Citation »

    The first attempt to train a connectionist model to pronounce English words in text, which greatly enhanced the reputation and application of connectionism in the early years.

    Find this resource:

  • Yang, Jianfeng, Bruce D. McCandliss, Hua Shu, and Jason D. Zevin. 2009. Simulating language-specific and language-general effects in a statistical learning model of Chinese reading. Journal of Memory and Language 61:238–257.

    DOI: 10.1016/j.jml.2009.05.001Save Citation »Export Citation »E-mail Citation »

    A connectionist model based on the triangle model to simulate reading processes in Chinese.

    Find this resource:

Models of First Language Learning

The Rumelhart and McClelland 1986 (cited under History) past-tense model inspired a great deal of interest in connectionist models of first language acquisition, and Elman, et al. 1996 (cited under Introductory Works) further spurred research in the understanding of complex developmental patterns through the connectionist perspective. Many studies followed in the footsteps of these pioneering works in using neural networks to study children (see Learning Sounds, Learning Morphology, Learning Syntactic Structure, and Learning Semantic Structure).

Learning Sounds

There have been some connectionist models of the acquisition of sounds, but this is a relatively weak spot compared with other acquisition topics. Several of these studies have been based on self-organizing maps. For example, Guenther and Gjaja 1996 introduced a model to simulate the “perceptual magnet effect” in infants’ phonetic learning. McCandliss, et al. 2002 demonstrates that, as training progresses, a network may gradually lose its sensitivity to discriminate similar phonemes novel to the network—a phenomenon related to the critical or sensitive period hypothesis in language acquisition. Gauthier, et al. 2009 uses a self-organizing map to explore whether and how infants might learn prosodic focus directly from continuous speech input in a tonal language like Mandarin. Plaut and Kello 1999 proposes a framework that can account for children’s phonological development. In addition, the Elman 1990 (cited under Other Network Structures) simple recurrent network (SRN) model can also be trained to find word boundaries based on sequences of continuous phonemic inputs in an artificial corpus, which was later applied in Christiansen, et al. 1998 to a real corpus of child-directed speech in a speech segmentation task.

  • Christiansen, Morten H., Joseph Allen, and Mark S. Seidenberg. 1998. Learning to segment speech using multiple cues: A connectionist model. Language and Cognitive Processes 13:221–268.

    DOI: 10.1080/016909698386528Save Citation »Export Citation »E-mail Citation »

    This work extends the Elman 1990 (cited under Other Network Structures) original study using a real corpus based on child-directed speech rather than an artificially generated corpus.

    Find this resource:

  • Gauthier, Bruno, Rushen Shi, and Yi Xu. 2009. Learning prosodic focus from continuous speech input: A neural network exploration. Language Learning and Development 5:94–114.

    DOI: 10.1080/15475440802698524Save Citation »Export Citation »E-mail Citation »

    A self-organizing maps (SOM) model was used to explore whether and how infants might learn prosodic focus directly from continuous speech input of Mandarin Chinese.

    Find this resource:

  • Guenther, Frank H., and Marin N. Gjaja. 1996. The perceptual magnet effect as an emergent property of neural map formation. Journal of the Acoustical Society of America 100:1111–1121.

    DOI: 10.1121/1.416296Save Citation »Export Citation »E-mail Citation »

    This work introduces a model based on SOM to simulate the “perceptual magnet effect” in infants’ phonetic learning.

    Find this resource:

  • McCandliss, Bruce D., Julie A. Fiez, A. Protopapas, M. Conway, and James L. McClelland. 2002. Success and failure in teaching the [l]-[r] contrast to Japanese adults: Tests of a Hebbian model of plasticity and stabilization in spoken language perception. Cognitive, Affective, and Behavioral Neuroscience 2:89–108.

    DOI: 10.3758/CABN.2.2.89Save Citation »Export Citation »E-mail Citation »

    This work demonstrates that as training progresses a neural network may gradually lose its sensitivity to discriminate similar phonemes novel to the network, accounting for mechanisms of plasticity or nonplasticity in language learning.

    Find this resource:

  • Plaut, David C., and Christopher T. Kello. 1999. The emergence of phonology from the interplay of speech comprehension and production: A distributed connectionist approach. In The emergence of language. Edited by Brian MacWhinney, 381–415. Mahwah, NJ: Lawrence Erlbaum.

    Save Citation »Export Citation »E-mail Citation »

    A connectionist model designed to account for children’s phonological development with a mechanism of forward mapping from articulation to acoustics to simulate infant babbling.

    Find this resource:

Learning Morphology

“U-shaped” learning seen in the acquisition of the English past tense and other grammatical domains has been one of the most studied topics in language acquisition. Rumelhart and McClelland 1986 (cited under History) uses a simple autoassociative network (without a hidden layer) trained by the backpropagation algorithm. Plunkett and Marchman 1991 argues that the use of hidden layers is important to U-shaped learning. Marchman 1993 further tests a lesioned model of English past-tense learning, and Hare and Elman 1995 extends the past-tense model to link child language acquisition with historical language change. Several other connectionist models have been developed to capture the learning of other types of grammatical morphology—for example, plural formation of nouns (Plunkett and Juola 1999), grammatical and lexical aspect (Zhao and Li 2009), and reversive prefixes of verbs (Li and MacWhinney 1996). These studies all demonstrate that a single mechanism, such as connectionist learning, can account for the acquisition of complex grammatical structures without stipulating the existence of separate mechanisms of linguistic rules (e.g., regular past-tense formation) versus associative exceptions (e.g., irregular mappings), as Steven Pinker and colleagues have suggested.

  • Hare, Mary, and Jeffrey L. Elman. 1995. Learning and morphological change. Cognition 56:61–98.

    DOI: 10.1016/0010-0277(94)00655-5Save Citation »Export Citation »E-mail Citation »

    This work attempts to use connectionist principles to account for historical language change. Structured populations of connectionist networks were modeled to simulate different generations in the production of Old English verbs.

    Find this resource:

  • Li, Ping, and Brian MacWhinney. 1996. Cryptotype, overgeneralization, and competition: A connectionist model of the learning of English reversive prefixes. Connection Science 8:3–30.

    DOI: 10.1080/095400996116938Save Citation »Export Citation »E-mail Citation »

    This study used a multilayered feedforward network to learn the reversive prefixes of English verbs (e.g., un-, dis-). Underlying the use of reversive prefixes is the notion of cryptotype, which was defined as an “elusive” semantic category by Benjamin Whorf. Connectionist modeling can capture such elusiveness through weighted network connections.

    Find this resource:

  • Marchman, Virginia A. 1993. Constraints on plasticity in a connectionist model of the English past tense. Journal of Cognitive Neuroscience 5:215–234.

    DOI: 10.1162/jocn.1993.5.2.215Save Citation »Export Citation »E-mail Citation »

    The connectionist network underwent lesions at different stages of training, allowing the investigator to simulate past-tense learning with a developmental perspective (e.g., study of sensitive periods in learning).

    Find this resource:

  • Plunkett, Kim, and Patrick Juola. 1999. A connectionist model of English past tense and plural morphology. Cognitive Science 23:463–490.

    DOI: 10.1207/s15516709cog2304_4Save Citation »Export Citation »E-mail Citation »

    The simulations were based on a realistic corpus of English (the Brown corpus), and the model captured U-shaped learning of both the English past tense of verbs and the plural formation of nouns.

    Find this resource:

  • Plunkett, Kim, and Virginia A. Marchman. 1991. U-shaped learning and frequency effects in a multi-layered perceptron: Implications for child language acquisition. Cognition 38:43–102.

    DOI: 10.1016/0010-0277(91)90022-VSave Citation »Export Citation »E-mail Citation »

    This multilayered network represents a significant improvement over the Rumelhart and McClelland 1986 (cited under History) autoassociative model. In addition, the model was able to capture the importance of type versus token frequencies of verbs in the learning of English past tense, which made the model more realistic for child learning.

    Find this resource:

  • Zhao, Xiaowei, and Ping Li. 2009. The acquisition of lexical and grammatical aspect in a developmental lexicon model. Linguistics 47:1075–1112.

    Save Citation »Export Citation »E-mail Citation »

    A model based on self-organizing maps (SOM) called DevLex-II was used to learn English grammatical and lexical aspect. The model showed that semantic properties of verbs, in addition to phonological and other formal properties, are important in the acquisition of grammatical morphology.

    Find this resource:

Learning Syntactic Structure

Syntax holds a special status in linguistics, and it has been challenging to connectionist researchers to demonstrate that connectionist models can learn syntactic structures. McClelland, et al. 1989 introduced a PDP model as a cue-based approach to replace the traditional rule-based approach to sentence comprehension. A basic framework of this approach was also laid out in the PDP Volume II in McClelland and Kawamoto 1986. Elman 1993 successfully applies the simple recurrent network (SRN) model (see Other Network Structures) to learn the hierarchical recursive structure of sentences. This success relies on the “starting small” principle, in which the network is exposed to incrementally more complex sentences over time, simulating a developmental course. Christiansen and Chater 1999 conducts a simulation similar to Jeffrey Elman’s work with more complex recursive structures in three artificial languages and obtains results qualitatively similar to human data.

  • Christiansen, Morten H., and Nick Chater. 1999. Toward a connectionist model of recursion in human linguistic performance. Cognitive Science 23:157–205.

    DOI: 10.1207/s15516709cog2302_2Save Citation »Export Citation »E-mail Citation »

    This paper reports a simulation with complex recursive structures in three artificial languages learned by an SRN network, and the simulation results are comparable to human performance.

    Find this resource:

  • Elman, Jeffrey. 1993. Learning and development in neural networks: The importance of starting small. Cognition 48:71–99.

    DOI: 10.1016/0010-0277(93)90058-4Save Citation »Export Citation »E-mail Citation »

    Elman introduced the “starting small” idea in connectionist language modeling. The SRN was first exposed to simple sentences without relative clauses and then was trained on incrementally more complex sentences with hierarchical recursive structures seen in relative clauses. The results matched with the “less is more” hypothesis proposed by Newport and colleagues.

    Find this resource:

  • McClelland, James L., and A. H. Kawamoto. 1986. Mechanisms of sentence processing: Assigning roles to constituents of sentences. In Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 2, Psychological and biological models. Edited by McClelland, James L., David E. Rumelhart, and the PDP Research Group, 272-325. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    A feed-forward autoassociative network (similar to the Rumelhart and McClelland 1986 model, cited under History) was used to assign the correct thematic case roles to sentence constituents. This model is one of the first attempts to represent syntactic information in a connectionist model. However, the model was largely “static” given that the thematic case roles were encoded in the input and the target patterns, rather than “dynamically” captured from the temporal sequences of words in sentences.

    Find this resource:

  • McClelland, James L., Mark F. St. John, and Roman Taraban. 1989. Sentence comprehension: A parallel distributed processing approach. Language and Cognitive Processes 4: 287-335.

    DOI: 10.1080/01690968908406371Save Citation »Export Citation »E-mail Citation »

    The “Sentence Gestalt” (SG) model was introduced in this paper. The model was trained based on a small-scale artificially generated corpus and was able to assign thematic roles to sentence constituents. The authors emphasized the importance of emerging internal distributed representations rather than the preexistence of structure-sensitive rules, which was the focus of traditional approaches of syntax learning.

    Find this resource:

Learning Semantic Structure

The learning of semantic categories or structures in the mental lexicon is another important research topic in connectionism. Elman 1990 (cited under Other Network Structures) has provided a simple but powerful mechanism in accounting for the emergence of semantic categories in mental representation. Ritter and Kohonen 1989 shows how self-organizing maps (SOM) can extract topographically structured semantic categories from linguistic input. Rogers and McClelland 2004 argues that connectionist architecture can account for the use and processes of human semantic knowledge in language and memory. Elman 2009 provides a new perspective on how semantic representation is context driven and often constructed on the fly. Many other studies mentioned elsewhere in this article (e.g., in Representing Linguistic Features in Models and Learning Morphology) also show that connectionist principles of statistical learning provide useful frameworks and tools for understanding semantic structures and semantic knowledge (see Burgess and Lund 1997 and Landauer and Dumais 1997, both cited under Representing Linguistic Features in Models, and Li and MacWhinney 1996, cited under Learning Morphology).

  • Elman, Jeffrey L. 2009. On the meaning of words and dinosaur bones: Lexical knowledge without a lexicon. Cognitive Science 33:547–582.

    DOI: 10.1111/j.1551-6709.2009.01023.xSave Citation »Export Citation »E-mail Citation »

    This paper argues for a new view of the mental lexicon based on the connectionist perspective, in which meanings of words are highly contingent upon linguistic and nonlinguistic contexts in which the words are used. Words only serve as cues during processing for what should or should not be construed as part of the semantics of the lexicon.

    Find this resource:

  • Ritter, Helge, and Teuvo Kohonen. 1989. Self-organizing semantic maps. Biological Cybernetics 61:241–254.

    DOI: 10.1007/BF00203171Save Citation »Export Citation »E-mail Citation »

    This classic work demonstrates that SOM networks can form topographically organized representations of semantic categories and such categories are implicitly in the linguistic environment and can be extracted by the SOM.

    Find this resource:

  • Rogers, Timothy T., and James L. McClelland. 2004. Semantic cognition: A parallel distributed processing approach. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    The book provides a general framework for how connectionist theory can capture human semantic knowledge, including the emergence of semantic structures in children, semantic knowledge in linguistic behavior, and loss of semantic knowledge in patients.

    Find this resource:

Simulating Lexical Development

Several connectionist models have been developed to simulate children’s early lexical development, including the phenomenon of vocabulary spurt and the sudden acceleration of word learning. Plunkett, et al. 1992 presents a multilayered autoassociative network that shows a vocabulary spurt in both comprehension and production. Regier 2005 provides a connectionist model that accounts for the occurrence of rapid vocabulary learning by emphasizing children’s increased ability for selective attention. A series of developmental lexicon (DevLex) models has provided a developmental framework of language acquisition based on self-organizing maps (SOM) in accounting for patterns of early lexical development. The original DevLex model was introduced in Li, et al. 2004, and its extension DevLex-II is developed in Li, et al. 2007.

  • Li, Ping, Igor Farkas, and Brian MacWhinney. 2004. Early lexical acquisition in a self-organizing neural network. Neural Networks 17:1345–1362.

    DOI: 10.1016/j.neunet.2004.07.004Save Citation »Export Citation »E-mail Citation »

    The original DevLex model was introduced in this paper. The model has two growing self-organizing maps connected by links trained through Hebbian learning. The model was able to capture the emergence of semantic categories and age of acquisition effects in incremental lexical learning.

    Find this resource:

  • Li, Ping, Xiaowei Zhao, and Brian MacWhinney. 2007. Dynamic self-organization and children’s word learning. Cognitive Science 31:581–612.

    Save Citation »Export Citation »E-mail Citation »

    The DevLex-II model was introduced in this paper. Trained with inputs derived from real linguistic environment (Child Language Data Exchange System [CHILDES] parental corpus), the DevLex models can account for a wide variety of patterns in early lexical development, including vocabulary spurt. DevLex-II also simulates early effects of word length and frequency in modulating vocabulary spurt.

    Find this resource:

  • Plunkett, Kim, Chris G. Sinha, Martin F. Møller, and Ole Strandsby. 1992. Symbol grounding or the emergence of symbols? Vocabulary growth in children and a connectionist net. Connection Science 4:293–312.

    DOI: 10.1080/09540099208946620Save Citation »Export Citation »E-mail Citation »

    A multilayered autoassociative network that successfully showed vocabulary spurt in both comprehension and production, although the inputs to the model were artificially generated matrix patterns. The authors argue for conceptual systematization as the cause of vocabulary spurt.

    Find this resource:

  • Regier, Terry. 2005. The emergence of words: Attentional learning in form and meaning. Cognitive Science 29:819–865.

    DOI: 10.1207/s15516709cog0000_31Save Citation »Export Citation »E-mail Citation »

    Regier’s LEX model accounted for increased vocabulary learning capacity by emphasizing children’s increased ability of selective attention. Increased selective attention, according to the LEX model, would allow the learner to focus on the relevant features for particular words while suppressing irrelevant features, thus enhancing the accuracy and speed of learning.

    Find this resource:

Models of Second Language

In contrast to the flourishing research in connectionist modeling of first language acquisition and processing, there has been only a handful of neural network models designed specifically to account for second language processing and learning. Dijkstra and van Heuven 1998 introduces the bilingual interactive activation (BIA) model, based on McClelland and Rumelhart 1981 (cited under Models of Language Processing), an original interactive activation (IA) model for monolingual visual word recognition. The BIA model suggests that bilingual word recognition is a nonselective, language-independent process, in which words from both languages are activated in parallel. Dijkstra and van Heuven 2002 further adds phonological and semantic representations to the model and extends BIA to BIA+. However, these models lack mechanisms of learning and are suited only for accounting for bilingual processing and not second language acquisition. French and Jacquet 2004 presents a connectionist model of bilingual language learning based on the simple recurrent network (SRN) model and shows that distinct lexical representations can develop from learning mixed bilingual input at the sentence level. Similar patterns of representation have also been reported in Li and Farkas 2002 in a bilingual learning model based on the self-organizing maps (SOM) architecture. A good review of computational models of second language is Thomas and van Heuven 2005. Hernandez, et al. 2005 and Hernandez and Li 2007 both present overviews of connectionist perspectives on bilingualism with regard to competition, entrenchment, and plasticity. Zhao and Li 2010 extends the DevLex-II model to study the development of the bilingual’s lexical representation. More recently, Zhao and Li 2013 also applied the DevLex-II model to simulating cross-language priming effects.

  • Dijkstra, Ton, and Walter J. B. van Heuven. 1998. The BIA model and bilingual word recognition. In Localist connectionist approaches to human cognition. Edited by Jonathan Grainger and Arthur M. Jacobs, 189–225. Mahwah, NJ: Lawrence Erlbaum.

    Save Citation »Export Citation »E-mail Citation »

    The BIA model is based on the classic IA model of McClelland and Rumelhart 1981 (cited under Models of Language Processing) and has been able to account for many empirical results in support of a language-independent, nonselective process of bilingual visual word recognition.

    Find this resource:

  • Dijkstra, Ton, and Walter J. B. van Heuven. 2002. The architecture of the bilingual word recognition system: From identification to decision. Bilingualism: Language and Cognition 5:175–197.

    Save Citation »Export Citation »E-mail Citation »

    The BIA+ model is introduced as an extension of the original BIA model. Phonological and semantic representations are added to the model on top of the orthographic representation to account for additional empirical effects of bilingual word processing.

    Find this resource:

  • French, Robert M., and Maud Jacquet. 2004. Understanding bilingual memory. Trends in Cognitive Science 8:87–93.

    DOI: 10.1016/j.tics.2003.12.011Save Citation »Export Citation »E-mail Citation »

    Presents a bilingual SRN model that learns mixed input of two languages at the sentence level. The model demonstrates that a single connectionist learning mechanism can develop two distinct representation systems, one for each language.

    Find this resource:

  • Hernandez, Arturo, and Ping Li. 2007. Age of acquisition: Its neural and computational mechanisms. Psychological Bulletin 133:638–650.

    DOI: 10.1037/0033-2909.133.4.638Save Citation »Export Citation »E-mail Citation »

    A comprehensive review of both neuroimaging and computational evidence of the age of acquisition effects. The authors provided a sensorimotor integration hypothesis to account for age of acquisition effects in both monolingual and bilingual processing and in both linguistic and nonlinguistic domains.

    Find this resource:

  • Hernandez, Arturo, Ping Li, and Brian MacWhinney. 2005. The emergence of competing modules in bilingualism. Trends in Cognitive Sciences 9:220–225.

    DOI: 10.1016/j.tics.2005.03.003Save Citation »Export Citation »E-mail Citation »

    The authors embraced the motto of Elizabeth Bates that “modules are made, not born” (Bates, et al., From First Words to Grammar: Individual Differences and Dissociable Mechanisms, New York: Cambridge Univ. Press, 1988, p. 284) and illustrated that the competitive interaction between a bilingual’s two languages during a developmental course of learning is responsible for the entrenchment and plasticity effects seen in second language learning.

    Find this resource:

  • Li, Ping, and Igor Farkas. 2002. A self-organizing connectionist model of bilingual processing. In Bilingual sentence processing. Edited by Roberto Heredia and Jeanette Altarriba, 59–85. Amsterdam: Elsevier Science.

    Save Citation »Export Citation »E-mail Citation »

    This work demonstrates that distinct lexical representations of two languages can evolve without explicit language separation or marking, and the model was among the first to model both within-language and cross-language priming effects with a connectionist architecture.

    Find this resource:

  • Thomas, Michael S. C., and Walter J. B. van Heuven. 2005. Computational models of bilingual comprehension. In Handbook of bilingualism: Psycholinguistic approaches. Edited by J. F. Kroll and A. M. B. de Groot, 202–225. New York: Oxford Univ. Press.

    Save Citation »Export Citation »E-mail Citation »

    This review provides an excellent overview of connectionist models of bilingualism and second language up to 2005.

    Find this resource:

  • Zhao, Xiaowei, and Ping Li. 2010. Bilingual lexical interactions in an unsupervised neural network model. International Journal of Bilingual Education and Bilingualism 13:505–524.

    DOI: 10.1080/13670050.2010.488284Save Citation »Export Citation »E-mail Citation »

    As an application of the DevLex-II model to bilingual lexical development, this work provides a computational account of the age of acquisition effect in second language learning by reference to the dynamic interaction and competition between the two languages.

    Find this resource:

  • Zhao, Xiaowei, and Ping Li. 2013. Simulating Cross-Language Priming with a Dynamic Computational Model of the Lexicon. Bilingualism: Language and Cognition 16: 288-303.

    DOI: 10.1017/S1366728912000624Save Citation »Export Citation »E-mail Citation »

    Based on the DevLex-II framework, this work added a spreading activation mechanism to simulate several important empirical findings reported in previous studies with regard to cross-language translational and semantic priming effects, including the “priming asymmetry effect”.

    Find this resource:

Theoretical Linguistics Inspired by Connectionism

Connectionism has had a profound impact on the study of linguistic behavior. It has become increasingly clear that language can no longer be studied as a pure, innate symbolic system, as has been advocated in classic works in linguistics. For example, some researchers have worked on developing hybrid linguistic frameworks with a connectionism-based microstructure and a macrostructure characterized by symbolic features. Smolensky and Legendre 2006 is an excellent two-volume collection that discusses theories and implications of these frameworks. The optimality theory for phonology also has roots in connectionism; indeed, the computation of the optimal representation with maximal harmony is similar to the search for minimum error in connectionist models (see a review of optimality theory and language acquisition in Fikkert and de Hoop 2009). In addition, Bybee 2001 argues that phonological patterns can be understood as emergent consequences of language usage (e.g., frequency of a word and its co-occurrence with other words), and Joan Bybee argues against domain-specific modules in the brain for language processing. This type of usage- or experience-based approaches to language is further discussed in Bybee and McClelland 2006, and these ideas are highly consistent with connectionist perspectives.

  • Bybee, Joan. 2001. Phonology and language use. Cambridge, UK: Cambridge Univ. Press.

    DOI: 10.1017/CBO9780511612886Save Citation »Export Citation »E-mail Citation »

    An influential work on phonological representation. The basic idea is that the mental representation of the phonology of words must include information about their usage.

    Find this resource:

  • Bybee, Joan, and James L. McClelland. 2005. Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review 22: 381–410.

    Save Citation »Export Citation »E-mail Citation »

    This article provides examples from phonology and morphology to support the idea that linguistic structures can emerge from the use of language, as has been demonstrated by connectionist models in many domains of language acquisition and processing.

    Find this resource:

  • Fikkert, Paula, and Helen de Hoop. 2009. Language acquisition in optimality theory. Linguistics 47:311–357.

    DOI: 10.1515/LING.2009.012Save Citation »Export Citation »E-mail Citation »

    This article provides a review of the state of affairs in language acquisition studies in the optimality theory framework with critical questions on future developments in this direction.

    Find this resource:

  • Smolensky, Paul, and Géraldin Legendre, eds. 2006. The harmonic mind: From neural computation to optimality-theoretic grammar. 2 vols. Cambridge, MA: MIT Press.

    Save Citation »Export Citation »E-mail Citation »

    Volume 1 (Cognitive Architecture) introduces the basic theories and cognitive frameworks. Volume 2 (Linguistic and Philosophical Implications) is designed for readers in linguistics with an emphasis on the implications of the hybrid framework for different aspects of language (e.g., phonology and syntax).

    Find this resource:

LAST MODIFIED: 10/29/2013

DOI: 10.1093/OBO/9780199772810-0010

back to top

Article

Up

Down