Control character

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Disclaimers | Privacy policy

Characters generally represent the graphemes, or written symbols, of a language in computer storage or electronic communications. It is often useful, however, to include along with those characters additional controlling information convenient to help process the characters. For example, a printer connected to a computer receives instructions to print characters on paper, and must also receive instructions to do things like control where the characters are placed on a page, to eject a page, signal the beginning or end of the transmission, or other functions. It is convenient to send these instructions along the same communication path as the ordinary characters, and control characters serve this purpose. A character encoding thus typically encodes both printable characters and control characters. ASCII, for example, reserves codes 0 through 31 and code 127 as control characters (see Table 1).

    0   Null                      17   Device control 1
    1   Start of heading          18   Device control 2
    2   Start of text             19   Device control 3
    3   End of text               20   Device control 4
    4   End of transmission       21   Negative acknowledge
    5   Enquiry                   22   Synchronous idle
    6   Acknowledge               23   End of transmission block
    7   Bell                      24   Cancel
    8   Backspace                 25   End of medium
    9   Horizontal tab            26   Substitute
   10   Line feed                 27   Escape
   11   Vertical tab              28   File Separator
   12   Form feed                 29   Group Separator
   13   Carriage return           30   Record Separator
   14   Shift out                 31   Unit Separator
   15   Shift in
   16   Data link escape         127   Delete
Table 1: ASCII control characters

Many of the ASCII control characters were designed for devices of the time that are not used today. For example, code 22, "Synchronous idle", was originally sent by synchronous modems (which have to send data constantly) when there was no actual data to send. Code 127 is a special case. Its code is all-bits-on in binary, which made it easy to erase a section of paper tape, a common storage medium of the day, by punching all the holes. Paper tape became obsolete quickly, so this feature was almost never used. But because its code is in the range occupied by other printable characters, many computers used it as an additional printable character (often an all-black "box" character useful for erasing text by overprinting). The codes still in common use include codes 7 (Bell, which may cause the device receiving it to emit a warning of some kind), 8 (Backspace, used either to erase the last character printed or to overprint it), 9 (Horizontal tab), 10 (Line feed, used to end lines in most Unix variants), 12 (Form feed, to cause a printer to eject a page), 13 (Carriage return, used to end lines of text on MacOS, and on MS-DOS derivatives -- which use a sequence of carriage return and line feed for this purpose), and 27 (Escape). Occasionally one might encounter modern uses of other codes such as code 4 (End of transmission) used to end a Unix shell session or PostScript printer transmission.

Code 27 (Escape) is a case worth elaborating. Even though many of these control characters are never used, the concept of sending device-control information intermixed with printable characters is so useful that device makers found a way to send hundreds of device instructions. Specifically, they used a series of multiple characters called an control sequence or escape sequence. Typically code 27 was first sent to alert the device that the following characters were to be interpreted as a control sequence rather than as plain characters, then one or more characters would follow specifying some detailed action, after which the device would go back to interpreting characters normally. For example, the sequence of code 27, followed by the printable characters "[2;10H", would cause a DEC VT-102 terminal to move its cursor to the 10th cell of the 2nd line of the screen. Some standards exist for these sequences, notably ANSI X3.64 (1979). But the number of non-standard variations in use is large, especially among printers, where technology has advanced far faster than any standards body can possibly follow.

ASCII-based keyboards have a key labelled "Control" or "Ctrl", which is used much like a shift key, being depressed in combination with another letter or symbol key to cause the keyboard to generate one of these 32 control codes. The keyboard produces the code 64 places below the code for the uppercase letter pressed. Pressing "control" and the letter "G" (code 71), for example, would produce the code 7 (Bell). Keyboards also have single keys that produce codes in this range. For example, the key labelled "Backspace" typically produces code 8, "Tab" code 9, "Enter" or "Return" code 13 (though some keyboards might produce code 10 for "Enter").

Modern keyboards have many keys that do not correspond to ASCII characters or control characters, for example cursor control arrows and word processing functions. These keyboards communicate these keys to the attached computer by one of three methods: appropriating some otherwise unused control character for the new use, using some encoding other than ASCII, or using multi-character control sequences. Keyboards attached to stand-alone personal computers typically use one (or both) of the first two methods. "Dumb" computer terminals typically use control sequences.