Finite State Languages
- LAST REVIEWED: 27 March 2014
- LAST MODIFIED: 27 March 2014
- DOI: 10.1093/obo/9780199772810-0181
- LAST REVIEWED: 27 March 2014
- LAST MODIFIED: 27 March 2014
- DOI: 10.1093/obo/9780199772810-0181
Introduction
A finite-state language—equivalently “regular language,” “type 3 language,” or “regular set”—belongs to the class of formal languages whose sentences can be generated or characterized by a number of different abstract devices—devices that are all ultimately equivalent in their generative capacity. These devices include type 3 generative grammars (regular grammars), regular expressions, finite automata, and read-only Turing machines. Essentially, almost any general model of computation that is restricted to possessing only a finite memory of predefined size will fall into this class. The origins of finite-state machines lie in early abstract neuron models and theories of computation, and they were later found to be equivalent to type 3 generative grammars. Today, interest in finite-state models is vast and encompasses research in formal language theory, mathematics, linguistics, logic, engineering, and theoretical computer science. Finite-state languages have been investigated and argued for and against as a potential model for capturing linguistic structure since the 1950s, particularly in the subdomains of syntax, morphology, and phonology. While finite-state models are often assumed to be too weak to capture syntactic structure—at least elegantly—they are now a mainstay of practical models of phonology and morphology in computational linguistics. Research into finite-state models of natural language continues because these models offer fruitful ways of approaching such matters as computational concerns, efficiency, learnability properties, and cognitive plausibility. Finite-state transducers—translation devices based on finite automata—are often categorized as “finite-state models” as well and are extensively used as generic devices for devising representations of various linguistic translations, such as phonological alternation patterns. In more recent developments, finite-state models enhanced with probabilistic information have been used to manipulate statistical models of language, and these are now widely employed for practical tasks in written language and speech processing. The literature on the topic traditionally employs different notation and expository style depending on the venue, with linguistics, mathematics (including formal language theory), and computer science publications using slightly varying conventions.
Foundational Works
The first formalization of what was later to be called finite automata is found in McCulloch and Pitts 1943, which was essentially a neural network model. This model was investigated intensively in subsequent years, with Kleene 1956 providing a more modern interpretation and showing the equivalence of regular expressions and finite automata. Many interesting properties of finite-state languages were discovered during the 1950s, showing that the expressive powers of different types of automata, regular expressions, and certain grammars are equivalent. Other discoveries from the same period include the notion that finite-state machines have canonical minimal representations (Moore 1956, Myhill 1957). Rabin and Scott 1959 introduces the influential concept of nondeterminism, while Chomsky 1959 places finite-state languages, or “type 3 grammars,” in what is now called the Chomsky hierarchy. The early analysis in Chomsky 1956 and the judgment that finite-state languages were unsuitable for describing natural language syntax was very influential in the domain of linguistics. Thompson 1968 marks the beginning of extensive use of finite-state techniques in computational text search—a circumstance that would later have an influence on the development of finite-state methods in computational linguistics.
Chomsky, Noam. 1956. Three models for the description of language. IRE Transactions on Information Theory 2.3: 113–124.
Presents one of the earliest arguments against the adequacy of finite-state models in capturing syntactic phenomena in language. The argument is essentially repeated in the seminal Syntactic Structures, published in 1957 (The Hague: Mouton).
Chomsky, Noam. 1959. On certain formal properties of grammars. Information and Control 2.2: 137–167.
DOI: 10.1016/S0019-9958(59)90362-6
An early analysis of the generative capacity of grammars in the Chomsky hierarchy.
Kleene, S. C. 1956. Representation of events in nerve nets and finite automata. In Automata studies. Edited by Claude E. Shannon and John McCarthy, 3–42. Annals of Mathematics Studies 34. Princeton, NJ: Princeton Univ. Press.
An early central work that makes the leap from McCulloch-Pitts neuron abstractions to regular expressions and finite automata and analyzes the resulting algebraic properties (note that “regular event,” a term often used in the earlier literature, is synonymous with “regular language”). The resulting equivalence of these is now known as “Kleene’s theorem.”
McCulloch, Warren S., and Walter Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5.4: 115–133.
DOI: 10.1007/BF02478259
An influential article that presents a precursor of what are now called finite automata, in the form of a network of abstract “neuron” elements.
Moore, Edward F. 1956. Gedanken-experiments on sequential machines. In Automata studies. Annals of Mathematics Studies 34. Edited by Claude E. Shannon and John McCarthy, 129–153. Princeton, NJ: Princeton Univ. Press.
An early paper that introduces the idea that every regular language is representable by a unique minimal automaton.
Myhill, John. 1957. Finite automata and the representation of events. Technical Report WADD TR-57-624. Dayton, OH: Wright Patterson Air Force Base.
The first proof of the Myhill-Nerode theorem—a tight, formal characterization of regular languages.
Rabin, Michael O., and Dana Scott. 1959. Finite automata and their decision problems. IBM Journal of Research and Development 3.2: 114–125.
DOI: 10.1147/rd.32.0114
The landmark paper that, among other things, introduces the idea of nondeterministic machines and shows the equivalence of nondeterministic and deterministic finite automata, as well as the equivalence of two-way and one-way automata, showing that a large range of seemingly different descriptive devices all correspond to the same class of regular languages.
Thompson, Ken. 1968. Programming techniques: Regular expression search algorithm. Communications of the ACM 11.6: 419–422.
An influential paper that introduced regular expression text search.
Users without a subscription are not able to see the full content on this page. Please subscribe or login.
How to Subscribe
Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here.
Article
- Acceptability Judgments
- Accessibility Theory in Linguistics
- Acquisition, Second Language, and Bilingualism, Psycholin...
- Adjectives
- Adpositions
- Affixation
- African Linguistics
- Afroasiatic Languages
- Agreement
- Algonquian Linguistics
- Altaic Languages
- Ambiguity, Lexical
- Analogy in Language and Linguistics
- Anaphora
- Animal Communication
- Aphasia
- Applicatives
- Applied Linguistics, Critical
- Arawak Languages
- Argument Structure
- Artificial Languages
- Attention and Salience
- Australian Languages
- Austronesian Linguistics
- Auxiliaries
- Balkans, The Languages of the
- Baudouin de Courtenay, Jan
- Berber Languages and Linguistics
- Bilingualism and Multilingualism
- Biology of Language
- Blocking
- Borrowing, Structural
- Caddoan Languages
- Caucasian Languages
- Causatives
- Celtic Languages
- Celtic Mutations
- Chomsky, Noam
- Chumashan Languages
- Classifiers
- Clauses, Relative
- Clinical Linguistics
- Cognitive Linguistics
- Colonial Place Names
- Comparative Reconstruction in Linguistics
- Comparative-Historical Linguistics
- Complementation
- Complexity, Linguistic
- Compositionality
- Compounding
- Comprehension, Sentence
- Computational Linguistics
- Conditionals
- Conjunctions
- Connectionism
- Consonant Epenthesis
- Constructions, Verb-Particle
- Contrastive Analysis in Linguistics
- Conversation Analysis
- Conversation, Maxims of
- Conversational Implicature
- Cooperative Principle
- Coordination
- Copula
- Creoles
- Creoles, Grammatical Categories in
- Critical Periods
- Cross-Language Speech Perception and Production
- Cyberpragmatics
- Default Semantics
- Definiteness
- Dementia and Language
- Dene (Athabaskan) Languages
- Dené-Yeniseian Hypothesis, The
- Dependencies
- Dependencies, Long Distance
- Derivational Morphology
- Determiners
- Dialectology
- Dialogue
- Diglossia
- Disfluency
- Distinctive Features
- Dravidian Languages
- Ellipsis
- Endangered Languages
- English as a Lingua Franca
- English, Early Modern
- English, Old
- Ergativity
- Eskimo-Aleut
- Euphemisms and Dysphemisms
- Evidentials
- Exemplar-Based Models in Linguistics
- Existential
- Existential Wh-Constructions
- Experimental Linguistics
- Fieldwork
- Fieldwork, Sociolinguistic
- Finite State Languages
- First Language Attrition
- Formulaic Language
- Francoprovençal
- French Grammars
- Frisian
- Gabelentz, Georg von der
- Gender
- Genealogical Classification
- Generative Syntax
- Genetics and Language
- Gestures
- Grammar, Categorial
- Grammar, Cognitive
- Grammar, Construction
- Grammar, Descriptive
- Grammar, Functional Discourse
- Grammars, Phrase Structure
- Grammaticalization
- Harris, Zellig
- Heritage Languages
- History of Linguistics
- History of the English Language
- Hmong-Mien Languages
- Hokan Languages
- Honorifics
- Humor in Language
- Hungarian Vowel Harmony
- Iconicity
- Ideophones
- Idiolect
- Idiom and Phraseology
- Imperatives
- Indefiniteness
- Indo-European Etymology
- Inflected Infinitives
- Information Structure
- Innateness
- Interface Between Phonology and Phonetics
- Interjections
- Intonation
- IPA
- Irony
- Iroquoian Languages
- Islands
- Isolates, Language
- Jakobson, Roman
- Japanese Word Accent
- Jones, Daniel
- Juncture and Boundary
- Khoisan Languages
- Kiowa-Tanoan Languages
- Kra-Dai Languages
- Labov, William
- Language Acquisition
- Language and Law
- Language Contact
- Language Documentation
- Language, Embodiment and
- Language for Specific Purposes/Specialized Communication
- Language, Gender, and Sexuality
- Language Geography
- Language Ideologies and Language Attitudes
- Language in Autism Spectrum Disorders
- Language Nests
- Language Revitalization
- Language Shift
- Language Standardization
- Language, Synesthesia and
- Languages of Africa
- Languages of the Americas, Indigenous
- Languages of the World
- Learnability
- Lexemes
- Lexical Access, Cognitive Mechanisms for
- Lexical Semantics
- Lexical-Functional Grammar
- Lexicography
- Lexicography, Bilingual
- Lexicon
- Linguistic Accommodation
- Linguistic Anthropology
- Linguistic Areas
- Linguistic Landscapes
- Linguistic Prescriptivism
- Linguistic Profiling and Language-Based Discrimination
- Linguistic Relativity
- Linguistics, Educational
- Listening, Second Language
- Literature and Linguistics
- Loanwords
- Machine Translation
- Maintenance, Language
- Mande Languages
- Markedness
- Mass-Count Distinction
- Mathematical Linguistics
- Mayan Languages
- Mental Health Disorders, Language in
- Mental Lexicon, The
- Mesoamerican Languages
- Metaphor
- Metathesis
- Metonymy
- Minority Languages
- Mixed Languages
- Mixe-Zoquean Languages
- Modification
- Mon-Khmer Languages
- Morphological Change
- Morphology
- Morphology, Blending in
- Morphology, Subtractive
- Movement
- Munda Languages
- Muskogean Languages
- Nasals and Nasalization
- Negation
- Niger-Congo Languages
- Non-Pama-Nyungan Languages
- Northeast Caucasian Languages
- Nostratic
- Number
- Numerals
- Oceanic Languages
- Papuan Languages
- Penutian Languages
- Philosophy of Language
- Phonetics
- Phonetics, Acoustic
- Phonetics, Articulatory
- Phonological Research, Psycholinguistic Methodology in
- Phonology
- Phonology, Computational
- Phonology, Early Child
- Pidgins
- Polarity
- Policy and Planning, Language
- Politeness in Language
- Polysemy
- Positive Discourse Analysis
- Possessives, Acquisition of
- Pragmatics, Acquisition of
- Pragmatics, Cognitive
- Pragmatics, Computational
- Pragmatics, Cross-Cultural
- Pragmatics, Developmental
- Pragmatics, Experimental
- Pragmatics, Game Theory in
- Pragmatics, Historical
- Pragmatics, Institutional
- Pragmatics, Second Language
- Pragmatics, Teaching
- Prague Linguistic Circle, The
- Presupposition
- Pronouns
- Psycholinguistics
- Quechuan and Aymaran Languages
- Questions
- Reading, Second-Language
- Reciprocals
- Reduplication
- Reflexives and Reflexivity
- Register and Register Variation
- Relevance Theory
- Representation and Processing of Multi-Word Expressions in...
- Salish Languages
- Sapir, Edward
- Saussure, Ferdinand de
- Second Language Acquisition, Anaphora Resolution in
- Semantic Maps
- Semantic Roles
- Semantic-Pragmatic Change
- Semantics, Cognitive
- Sentence Processing in Monolingual and Bilingual Speakers
- Sign Language Linguistics
- Slang
- Sociolinguistics
- Sociolinguistics, Variationist
- Sociopragmatics
- Sonority
- Sound Change
- South American Indian Languages
- Specific Language Impairment
- Speech, Deceptive
- Speech Perception
- Speech Production
- Speech Synthesis
- Suppletion
- Switch-Reference
- Syllables
- Syncretism
- Synonymy
- Syntactic Change
- Syntactic Knowledge, Children’s Acquisition of
- Tense, Aspect, and Mood
- Text Mining
- Tone
- Tone Sandhi
- Topic
- Transcription
- Transitivity and Voice
- Translanguaging
- Translation
- Trubetzkoy, Nikolai
- Tucanoan Languages
- Tupian Languages
- Typology
- Usage-Based Linguistics
- Uto-Aztecan Languages
- Valency Theory
- Verbs, Serial
- Vocabulary, Second Language
- Voice and Voice Quality
- Vowel Harmony
- Whitney, William Dwight
- Word Classes
- Word Formation in Japanese
- Word Recognition, Spoken
- Word Recognition, Visual
- Word Stress
- Writing, Second Language
- Writing Systems
- Yiddish
- Zapotecan Languages