Machine Translation
- LAST REVIEWED: 06 August 2019
- LAST MODIFIED: 13 January 2014
- DOI: 10.1093/obo/9780199772810-0170
- LAST REVIEWED: 06 August 2019
- LAST MODIFIED: 13 January 2014
- DOI: 10.1093/obo/9780199772810-0170
Introduction
Machine translation (MT) is an interdisciplinary scientific field that brings together linguists, lexicologists, computer scientists, and translation practitioners in the pursuit of a common goal: to design and develop electronic resources and computer software capable of automatically translating a document in a source language (SL) into an equivalent text in a target language (TL). By extension, machine translation technologies also include tools aimed at helping human translators to perform their work more efficiently using computer-assisted translation (CAT) technology. Machine Translation started in the late 1950s with attempts to automatically translate Russian into English. Realization of the extreme difficulty of the task led the MT community to concentrate its efforts on more focused and realistic problems, starting the field of natural language processing (NLP) studies. MT was thus broken down into three main sub-issues: analyzing the SL into a more abstract representation, transferring this representation into an equivalent target representation, and, finally, generating a proper surface realization in TL. Capitalizing on the progress in applied NLP and artificial intelligence, MT made slow progress over the next thirty years, using mostly symbolic models of language processing to accomplish the analysis, transfer, and generation processes. In spite of several remarkable achievements, these models were challenged in the 1980s by corpus-based methodologies, which rely on the analysis of large bodies of manually translated bitexts to generate translations of new documents. In particular, the statistical approaches in machine translation introduced in the early 1990s, and subsequently improved during the next decade, have rapidly gained momentum. Relying on the systematic exploitation of huge corpora of monolingual texts and multilingual bitexts available on the Internet, these approaches appear to be the most effective today for a wide variety of uses. Statistical approaches can handle almost any language pairs, provided a sufficient access to parallel corpora is available. Most studies, nonetheless, focus on machine translation into English.
Textbooks
Very few textbooks are available that deal solely with machine translation. Even though Jurafsky and Martin 2009 contains only a concise introduction to the issue, the volume overall provides an in-depth exposition of the entire conceptual background necessary to understand the vast literature on MT. Koehn 2010 focuses exclusively on statistical approaches, whereas Hutchins and Somers 1992 and Arnold, et al. 1994 are classical references documenting early rule-based approaches in MT.
Arnold, Douglas J., Lorna Balkan, Siety Meijer, R. Lee Humphreys, and Louisa Sadler. 1994. Machine translation: An introductory guide. Manchester, UK: Blackwells NCC.
Similar in scope to Hutchins and Somers 1992 with a less technical perspective, this book makes a good choice for a more general audience. Also available online.
Hutchins, W. John, and Harold L. Somers. 1992. An introduction to machine translation. London: Academic Press.
A basic course book covering all topics related to the design and development of MT systems, from the linguistic problems to the detailed analysis of some prototypical rule-based engines. Does not cover corpus-based methodology.
Jurafsky, Daniel, and James H. Martin. 2009. Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics. 2d ed. Upper Saddle River, NJ: Pearson Prentice Hall.
This general purpose NLP textbook cover a very large spectrum of topics. It notably includes (pp. 799–830) a rather broad, nontechnical introduction to MT and to translation problems from a computational linguistics perspective, with many valuable references.
Koehn, Philipp. 2010. Statistical machine translation. Cambridge, UK: Cambridge Univ. Press.
The most comprehensive reference textbook on statistical machine translation, including many recent extensions of the statistical framework.
Users without a subscription are not able to see the full content on this page. Please subscribe or login.
How to Subscribe
Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here.
Article
- Acceptability Judgments
- Acoustic Phoneitcs
- Acquisition, Second Language, and Bilingualism, Psycholin...
- Adpositions
- Affixation
- African Linguistics
- Afroasiatic Languages
- Agreement
- Algonquian Linguistics
- Altaic Languages
- Analogy in Language and Linguistics
- Anaphora
- Animal Communication
- Aphasia
- Applicatives
- Arawak Languages
- Argument Structure
- Artificial Languages
- Athabaskan Languages
- Australian Languages
- Austronesian Linguistics
- Auxiliaries
- Balkans, The Languages of the
- Berber Languages and Linguistics
- Bilingualism and Multilingualism
- Biology of Language
- Blocking
- Caddoan Languages
- Caucasian Languages
- Causatives
- Celtic Languages
- Celtic Mutations
- Chomsky, Noam
- Chumashan Languages
- Classifiers
- Clauses, Relative
- Clinical Linguistics
- Cognitive Linguistics
- Comparative Reconstruction in Linguistics
- Comparative-Historical Linguistics
- Complementation
- Compositionality
- Compounding
- Computational Linguistics
- Conditionals
- Conjunctions
- Connectionism
- Consonant Epenthesis
- Contrastive Analysis in Linguistics
- Conversation Analysis
- Conversation, Maxims of
- Conversational Implicature
- Cooperative Principle
- Coordination
- Copula
- Creoles
- Creoles, Grammatical Categories in
- Critical Periods
- Cross-Language Speech Perception and Production
- Definiteness
- Dene-Yeniseian
- Dependencies
- Dependencies, Long Distance
- Derivational Morphology
- Determiners
- Dialectology
- Dialogue
- Diglossia
- Disfluency
- Distinctive Features
- Dravidian Languages
- Ellipsis
- Endangered Languages
- Ergativity
- Eskimo-Aleut
- Euphemisms and Dysphemisms
- Evidentials
- Exemplar-Based Models in Linguistics
- Existential
- Existential Wh-Constructions
- Experimental Linguistics
- Fieldwork
- Finite State Languages
- Formulaic Language
- Francoprovençal
- Frisian
- Gender
- Genealogical Classification
- Generative Syntax
- Genetics and Language
- Gestures
- Grammar, Categorial
- Grammar, Construction
- Grammar, Descriptive
- Grammar, Functional Discourse
- Grammars, Phrase Structure
- Grammaticalization
- Harris, Zellig
- Heritage Languages
- History of Linguistics
- History of the English Language
- Hmong-Mien Languages
- Hokan Languages
- Humor in Language
- Hungarian Vowel Harmony
- Iconicity
- Ideophones
- Idiolect
- Idiom and Phraseology
- Imperatives
- Indefiniteness
- Indo-European Etymology
- Inflected Infinitives
- Information Structure
- Innateness
- Interface Between Phonology and Phonetics
- Interjections
- Intonation
- IPA
- Iroquoian Languages
- Islands
- Japanese Word Accent
- Jones, Daniel
- Juncture and Boundary
- Kiowa-Tanoan Languages
- Kra-Dai Languages
- Labov, William
- Language Acquisition
- Language and Law
- Language Contact
- Language Documentation
- Language, Gender, and Sexuality
- Language Geography
- Language Ideologies and Language Attitudes
- Language in Autism Spectrum Disorders
- Language Revitalization
- Language Shift
- Language Standardization
- Languages of Africa
- Languages of the Americas, Indigenous
- Languages of the World
- Learnability
- Lexemes
- Lexical Access, Cognitive Mechanisms for
- Lexical Semantics
- Lexical-Functional Grammar
- Lexicography
- Linguistic Anthropology
- Linguistic Areas
- Linguistic Landscapes
- Linguistic Prescriptivism
- Linguistic Relativity
- Literature and Linguistics
- Loanwords
- Machine Translation
- Mande Languages
- Markedness
- Mass-Count Distinction
- Mathematical Linguistics
- Mayan Languages
- Mental Health Disorders, Language in
- Mesoamerican Languages
- Metaphor
- Metathesis
- Metonymy
- Minority Languages
- Mixed Languages
- Mixe-Zoquean Languages
- Modification
- Mon-Khmer Languages
- Morphological Change
- Morphology
- Morphology, Subtractive
- Movement
- Munda Languages
- Muskogean Languages
- Nasals and Nasalization
- Negation
- Niger-Congo Languages
- Non-Pama-Nyungan Languages
- Northeast Caucasian Languages
- Nostratic
- Number
- Numerals
- Oceanic Languages
- Papuan Languages
- Penutian Languages
- Philosophy of Language
- Phonetics
- Phonetics, Articulatory
- Phonological Research, Psycholinguistic Methodology in
- Phonology
- Phonology, Computational
- Pidgins
- Polarity
- Politeness in Language
- Pragmatics, Acquisition of
- Pragmatics, Experimental
- Prague Linguistic Circle, The
- Presupposition
- Pronouns
- Psycholinguistics
- Quechuan and Aymaran Languages
- Questions
- Reciprocals
- Reduplication
- Reflexives and Reflexivity
- Salish Languages
- Saussure, Ferdinand de
- Semantic Change
- Semantic Maps
- Semantic Roles
- Sentence Processing in Monolingual and Bilingual Speakers
- Sign Language Linguistics
- Sociolinguistics
- Sociolinguistics, Variationist
- Sonority
- Sound Change
- South American Indian Languages
- Specific Language Impairment
- Speech Perception
- Speech Production
- Speech Synthesis
- Suppletion
- Switch-Reference
- Syllables
- Syncretism
- Synonymy
- Syntactic Change
- Syntactic Knowledge, Children’s Acquisition of
- Tense, Aspect, and Mood
- Tone
- Tone Sandhi
- Topic
- Transcription
- Transitivity and Voice
- Translation
- Trubetzkoy, Nikolai
- Tucanoan Languages
- Tupian Languages
- Typology
- Usage-Based Linguistics
- Uto-Aztecan Languages
- Verbs, Serial
- Visual Word Recognition
- Vowel Harmony
- Word Classes
- Word Formation in Japanese
- Word Stress
- Writing Systems
- Yiddish
- Zapotecan Languages