Machine Translation
- LAST REVIEWED: 30 August 2022
- LAST MODIFIED: 24 April 2023
- DOI: 10.1093/obo/9780199772810-0170
- LAST REVIEWED: 30 August 2022
- LAST MODIFIED: 24 April 2023
- DOI: 10.1093/obo/9780199772810-0170
Introduction
Machine translation (MT) is an interdisciplinary scientific field that brings together linguists, lexicologists, computer scientists, and translation practitioners in the pursuit of a common goal: to design and develop electronic resources and computer software capable of automatically translating a document in a source language (SL) into an equivalent text in a target language (TL). By extension, machine translation technologies also include tools aimed at helping human translators to perform their work more efficiently using computer-assisted translation (CAT) technology. Machine translation started in the late 1950s with attempts to automatically translate Russian into English. Realization of the extreme difficulty of the task led the MT community to concentrate its efforts on more focused and realistic problems, starting the field of natural language processing (NLP) studies. MT was thus broken down into three main sub-issues: analyzing the SL into a more abstract representation, transferring this representation into an equivalent target representation, and, finally, generating a proper surface realization in TL. Capitalizing on the progress in applied NLP and artificial intelligence, MT made slow progress over the next thirty years, using mostly symbolic models of language processing to accomplish the analysis, transfer, and generation processes. Despite of several remarkable achievements, these models were challenged in the 1980s by corpus-based methodologies, which rely on the analysis of large bodies of manually translated bitexts to generate translations of new documents. In particular, the statistical approaches in machine translation introduced in the early 1990s, and subsequently improved during the next decade, have rapidly gained momentum. As of 2014, statistical approaches have been superseded by more powerful machine learning techniques based on artificial neural networks. Relying on the systematic exploitation of huge corpora of monolingual texts and multilingual bitexts available on the Internet, ” Neural Machine Translation” appears to be the most effective approach today for a wide variety of uses. Neural approaches can handle almost any language pair, provided a sufficient access to parallel corpora is available. A remarkable recent evolution is the development of multilingual translation models that are able to handle multiple languages directions in one single system.
Textbooks
Very few textbooks are available that deal solely with machine translation. The most up-to date textbook is Koehn 2020, which contains a full exposition of Neural MT. Koehn 2010 focuses solely on statistical approaches, whereas Hutchins and Somers 1992 and Arnold, et al. 1994 are classical references documenting early rule-based approaches in MT. General purpose natural language processing (NLP) textbooks also include concise presentations of MT: this is the case notably of Eisenstein 2019 and of the latest edition of Jurafsky and Martin 2009. These volumes overall contain an in-depth exposition of the entire conceptual background necessary to understand the vast literature on MT.
Arnold, Douglas J., Lorna Balkan, Siety Meijer, R. Lee Humphreys, and Louisa Sadler. 1994. Machine translation: An introductory guide. Manchester, UK: Blackwells NCC.
Similar in scope to Hutchins and Somers 1992 with a less technical perspective, this book makes a good choice for a more general audience.
Eisenstein, Jacob. 2019. An introduction to natural language processing. Cambridge, MA: MIT Press.
A modern introduction to the field of language processing targeting computer scientists, with a detailed presentation of machine learning techniques and models. A dedicated chapter (18) discusses machine translation models.
Hutchins, W. John, and Harold L. Somers. 1992. An introduction to machine translation. London: Academic Press.
A basic course book covering all topics related to the design and development of MT systems, from linguistic problems to detailed analysis of some prototypical rule-based engines. Somewhat outdated, as it does not cover corpus-based methodologies.
Jurafsky, Daniel, and James H. Martin. 2009. Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics. 3d ed. Upper Saddle River, NJ: Pearson Prentice Hall.
This general-purpose NLP textbook covers a very large spectrum of topics. The third revision includes a major rewrite of the machine translation chapter, which integrates recent advances in the field (chapter 10). It also includes an introduction to translation problems from a computational linguistics perspective with many valuable references.
Koehn, Philipp. 2010. Statistical machine translation. Cambridge, UK: Cambridge Univ. Press.
The most comprehensive reference textbook on statistical machine translation, including many extensions of the statistical framework.
Koehn, Philipp. 2020. Neural machine translation. Cambridge, UK: Cambridge Univ. Press.
This recent textbook documents the latest technical evolutions of machine translation and provides readers with a complete overview of the neural machine translation paradigm.
Users without a subscription are not able to see the full content on this page. Please subscribe or login.
How to Subscribe
Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here.
Article
- Acceptability Judgments
- Acquisition, Second Language, and Bilingualism, Psycholin...
- Adpositions
- Affixation
- African Linguistics
- Afroasiatic Languages
- Agreement
- Algonquian Linguistics
- Altaic Languages
- Ambiguity, Lexical
- Analogy in Language and Linguistics
- Anaphora
- Animal Communication
- Aphasia
- Applicatives
- Applied Linguistics, Critical
- Arawak Languages
- Argument Structure
- Artificial Languages
- Athabaskan Languages
- Australian Languages
- Austronesian Linguistics
- Auxiliaries
- Balkans, The Languages of the
- Baudouin de Courtenay, Jan
- Berber Languages and Linguistics
- Bilingualism and Multilingualism
- Biology of Language
- Blocking
- Borrowing, Structural
- Caddoan Languages
- Caucasian Languages
- Causatives
- Celtic Languages
- Celtic Mutations
- Chomsky, Noam
- Chumashan Languages
- Classifiers
- Clauses, Relative
- Clinical Linguistics
- Cognitive Linguistics
- Colonial Place Names
- Comparative Reconstruction in Linguistics
- Comparative-Historical Linguistics
- Complementation
- Complexity, Linguistic
- Compositionality
- Compounding
- Computational Linguistics
- Conditionals
- Conjunctions
- Connectionism
- Consonant Epenthesis
- Contrastive Analysis in Linguistics
- Conversation Analysis
- Conversation, Maxims of
- Conversational Implicature
- Cooperative Principle
- Coordination
- Copula
- Creoles
- Creoles, Grammatical Categories in
- Critical Periods
- Cross-Language Speech Perception and Production
- Cyberpragmatics
- Default Semantics
- Definiteness
- Dementia and Language
- Dené-Yeniseian Hypothesis, The
- Dependencies
- Dependencies, Long Distance
- Derivational Morphology
- Determiners
- Dialectology
- Dialogue
- Diglossia
- Disfluency
- Distinctive Features
- Dravidian Languages
- Ellipsis
- Endangered Languages
- English as a Lingua Franca
- English, Early Modern
- English, Old
- Ergativity
- Eskimo-Aleut
- Euphemisms and Dysphemisms
- Evidentials
- Exemplar-Based Models in Linguistics
- Existential
- Existential Wh-Constructions
- Experimental Linguistics
- Fieldwork
- Fieldwork, Sociolinguistic
- Finite State Languages
- First Language Attrition
- Formulaic Language
- Francoprovençal
- French Grammars
- Frisian
- Gabelentz, Georg von der
- Gender
- Genealogical Classification
- Generative Syntax
- Genetics and Language
- Gestures
- Grammar, Categorial
- Grammar, Construction
- Grammar, Descriptive
- Grammar, Functional Discourse
- Grammars, Phrase Structure
- Grammaticalization
- Harris, Zellig
- Heritage Languages
- History of Linguistics
- History of the English Language
- Hmong-Mien Languages
- Hokan Languages
- Honorifics
- Humor in Language
- Hungarian Vowel Harmony
- Iconicity
- Ideophones
- Idiolect
- Idiom and Phraseology
- Imperatives
- Indefiniteness
- Indo-European Etymology
- Inflected Infinitives
- Information Structure
- Innateness
- Interface Between Phonology and Phonetics
- Interjections
- Intonation
- IPA
- Irony
- Iroquoian Languages
- Islands
- Isolates, Language
- Jakobson, Roman
- Japanese Word Accent
- Jones, Daniel
- Juncture and Boundary
- Khoisan Languages
- Kiowa-Tanoan Languages
- Kra-Dai Languages
- Labov, William
- Language Acquisition
- Language and Law
- Language Contact
- Language Documentation
- Language, Embodiment and
- Language for Specific Purposes/Specialized Communication
- Language, Gender, and Sexuality
- Language Geography
- Language Ideologies and Language Attitudes
- Language in Autism Spectrum Disorders
- Language Nests
- Language Revitalization
- Language Shift
- Language Standardization
- Language, Synesthesia and
- Languages of Africa
- Languages of the Americas, Indigenous
- Languages of the World
- Learnability
- Lexemes
- Lexical Access, Cognitive Mechanisms for
- Lexical Semantics
- Lexical-Functional Grammar
- Lexicography
- Lexicography, Bilingual
- Linguistic Accommodation
- Linguistic Anthropology
- Linguistic Areas
- Linguistic Landscapes
- Linguistic Prescriptivism
- Linguistic Profiling and Language-Based Discrimination
- Linguistic Relativity
- Linguistics, Educational
- Listening, Second Language
- Literature and Linguistics
- Loanwords
- Machine Translation
- Maintenance, Language
- Mande Languages
- Markedness
- Mass-Count Distinction
- Mathematical Linguistics
- Mayan Languages
- Mental Health Disorders, Language in
- Mental Lexicon, The
- Mesoamerican Languages
- Metaphor
- Metathesis
- Metonymy
- Minority Languages
- Mixed Languages
- Mixe-Zoquean Languages
- Modification
- Mon-Khmer Languages
- Morphological Change
- Morphology
- Morphology, Blending in
- Morphology, Subtractive
- Movement
- Munda Languages
- Muskogean Languages
- Nasals and Nasalization
- Negation
- Niger-Congo Languages
- Non-Pama-Nyungan Languages
- Northeast Caucasian Languages
- Nostratic
- Number
- Numerals
- Oceanic Languages
- Papuan Languages
- Penutian Languages
- Philosophy of Language
- Phonetics
- Phonetics, Acoustic
- Phonetics, Articulatory
- Phonological Research, Psycholinguistic Methodology in
- Phonology
- Phonology, Computational
- Phonology, Early Child
- Pidgins
- Polarity
- Policy and Planning, Language
- Politeness in Language
- Polysemy
- Positive Discourse Analysis
- Possessives, Acquisition of
- Pragmatics, Acquisition of
- Pragmatics, Cognitive
- Pragmatics, Computational
- Pragmatics, Experimental
- Pragmatics, Game Theory in
- Pragmatics, Historical
- Pragmatics, Second Language
- Prague Linguistic Circle, The
- Presupposition
- Pronouns
- Psycholinguistics
- Quechuan and Aymaran Languages
- Questions
- Reading, Second-Language
- Reciprocals
- Reduplication
- Reflexives and Reflexivity
- Register and Register Variation
- Relevance Theory
- Representation and Processing of Multi-Word Expressions in...
- Salish Languages
- Saussure, Ferdinand de
- Second Language Acquisition, Anaphora Resolution in
- Semantic Maps
- Semantic Roles
- Semantic-Pragmatic Change
- Semantics, Cognitive
- Sentence Processing in Monolingual and Bilingual Speakers
- Sign Language Linguistics
- Slang
- Sociolinguistics
- Sociolinguistics, Variationist
- Sociopragmatics
- Sonority
- Sound Change
- South American Indian Languages
- Specific Language Impairment
- Speech, Deceptive
- Speech Perception
- Speech Production
- Speech Synthesis
- Suppletion
- Switch-Reference
- Syllables
- Syncretism
- Synonymy
- Syntactic Change
- Syntactic Knowledge, Children’s Acquisition of
- Tense, Aspect, and Mood
- Text Mining
- Tone
- Tone Sandhi
- Topic
- Transcription
- Transitivity and Voice
- Translation
- Trubetzkoy, Nikolai
- Tucanoan Languages
- Tupian Languages
- Typology
- Usage-Based Linguistics
- Uto-Aztecan Languages
- Valency Theory
- Verbs, Serial
- Visual Word Recognition
- Vocabulary, Second Language
- Voice and Voice Quality
- Vowel Harmony
- Whitney, William Dwight
- Word Classes
- Word Formation in Japanese
- Word Stress
- Writing, Second Language
- Writing Systems
- Yiddish
- Zapotecan Languages