Character encoding

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Privacy policy

A character encoding is a code that pairs a set of natural language characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. Common examples include Morse code, which encodes letters of the Roman alphabet as series of long and short depressions of a telegraph key; and ASCII, which encodes letters, numerals, and other symbols as integers.

In some contexts (especially computer storage and communication) it makes sense to distinguish a character set or character repertoire, which is a full set of characters that a system supports, from a character encoding which specifies how to represent characters from that set using a number of codes. For example, the full repertoire of Unicode encompasses thousands of characters, each with a unique 32-bit code. But since most applications use only a small subset, there are more efficient ways to represent Unicode characters in computer storage or communications using shorter words, for example, UTF-8 and UTF-16. This type of character encoding (which could also be considered a simple text encoding) uses data compression techniques to represent a large repertoire with a smaller number of codes.

Popular character encodings: