The study of natural language, as done by computer programs. Typically, this means a program trying to understand English (or French, or any other natural language) typed or spoken by a person. This field is also known as Natural Language Processing (NLP), a term that emphasizes the engineering aspect of computational linguistics.
Examples of NLP systems include computer programs that automatically
- translate text from Japanese to English (machine translation)
- answer a plain English question with succinct information extracted from large collections of documents
- summarize a newspaper article
There are several problems in getting programs to understand natural language. Many of these are due to the ambiguity in the language:
- word boundary detection: in spoken language, there are no gaps between words; where to place the word boundary often depends on what choice makes the most sense gramatically and given the context.
- word sense disambiguation: the same word can have several different meanings; we have to select the meaning which makes the most sense in context
- syntactic ambiguity: the grammar for natural languages is not unambiguous, i.e. there are often multiple possible parse trees for a given sentence. choosing the correct one requires semantic information
- speech acts and plans: sentences often don't mean what they literally mean; for instance the correct answer to "can you pass the salt?" is to pass the salt, not say "yes". or again, if a class was not offered last year, the correct answer to the question "how many students failed the class last year?" is "the class was not offered last year", not "none".
- computational linguistics is the scientific study of language from a computational perspective. Computational linguists are interested in providing computational models of various kinds of linguistic phenomena.