The study of natural language, as done by computer programs. Typically, this means a program trying to understand English (or French, or any other natural language) typed or spoken by a person. This field is also known as Natural Language Processing (NLP), a term that emphasizes the engineering aspect of computational linguistics.
Examples of NLP systems include computer programs that automatically
- translate text from Japanese to English (machine translation)
- answer a plain English question with succinct information extracted from large collections of documents
- summarize a newspaper article
There are several problems in getting programs to understand natural language. Many of these are due to the ambiguity in the language:
- word boundary detection: in spoken language, there are no gaps between words; where to place the word boundary often depends on what choice makes the most sense gramatically and given the context.
- word sense disambiguation: the same word can have several different meanings; we have to select the meaning which makes the most sense in context
- syntactic ambiguity: the grammar for natural languages is not unambiguous, i.e. there are often multiple possible parse trees for a given sentence. choosing the correct one requires semantic information
- speech acts and plans: sentences often don't mean what they literally mean; for instance the correct answer to "can you pass the salt?" is to pass the salt, not say "yes". or again, if a class was not offered last year, the correct answer to the question "how many students failed the class last year?" is "the class was not offered last year", not "none".
The Association for Computational Linguistics definition:
- computational linguistics is the scientific study of language from a computational perspective. Computational linguists are interested in providing computational models of various kinds of linguistic phenomena.