Integral data type

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Disclaimers | Privacy policy

The integral data types (so called because they are most frequently used to represent integers) of computing generally consist of some number of bits (usually a power of two) treated as a unit of storage or manipulation. Bit is derived from the term Binary digIT, and represents the fundamental unit of computer storage--0 or 1, on or off. Everything else is just a bunch-o-bits.

The table below lists data types recognized by common processors. Additional data types, such as bit-fields and extended-precision integers, found in high level programming languages are not discussed here. Following the table are additional usage notes, then details on number representation.

See also: real data type


bits name comments
1   bit   status, Boolean flag
4   nibble, nybble   humorously derived half a byte; usually a single BCD digit
8   byte, octet   small integers, characters
16   word   larger integers, pointers
32   longword   usually shortened to long; larger integers, pointers
64   quadword, long long   larger integers, pointers
80   tenbyte   Intel-specific, probably should be in floating point article?
128   octword   VMS internal date/time format

In addition to their interpretation as sizes of numerical values, three terms (bit, byte, and word) have other common usages. word is ambiguous, it often indicates the "most efficient size" of data for a processor--typically the size of its internal registers. Thus various families, or different models within families, of processors had different sized words-- 8-, 12-, 16-, 32-, 36-, 60- and 64-bit words have all been used. byte sometimes means some a quantity of bits other than 8; 36-bit word architectures commonly had 9-bit bytes. The term octet can be used for more clarity, and always refers to eight bits. The other terms (in the table) are typically used only when the content is to be interpreted numerically.

Telecommunications or network traffic volume is usually described in terms of bits per second. For example, a 56Kb modem is capable of transferring data at 56 kilobits/second; Ethernet transfers data at speeds ranging from 10 megabits/second to 1000 megabits/second.

A byte, usually called an octet in a networking context, is used to specify the size or amount of computer memory or storage, regardless of the type of data represented. For example, a 50 byte text string, 100 KB (kilobytes) files, 128 MB (megabytes) of RAM, or 30 GB (gigabytes) of disk storage.

Pointer is a generic term used to indicate an integral value (or a structure thereof) that is used to specify ("point to") a location (address) in memory.

Representing integers

complement, one's-complement, two's-complement, and so on.

Complementing a binary number simply means changing all the 0s to 1s and all the 1s to 0s, nothing more.

A byte, holding 8 bits, can represent the values 00000000 (0) to 11111111 (25510), if all bits are used to represent the magnitude of the number. This is called an unsigned integer.

To represent both positive and negative (signed) integers, the convention is that the most significant bit (MSB) of the binary representation of the number will be used to indicate the sign of the number, rather than contributing to its magnitude; three formats have been used for representing the magnitude: sign-and-magnitude, one's complement and two's complement, which is by far the most common nowadays.

Sign-and-magnitude is the simplest and most like human writing forms. The MSB is set to 0 for a positive number and 1 for a negative number. The remaining bits in the number indicate the (positive) magnitude. Hence in a byte with only seven bits (apart from the sign bit), the magnitude can range from 0000000 (0) to 1111111 (127). Thus you can represent numbers from -12710 to +12710. -43 encoded in a byte this way is 10101011.

The one's-complement representation of a negative number is created by taking the complement of its positive counterpart. For example, negated 00101011 (43) becomes 11010100 (-43) (Notice how this is different from the sign-and-magnitude convention where the same bit pattern would be -84). The PDP-1 uses one's-complement arithmetic. The range of signed numbers using one's complement in a byte is -12710 to +12710.

Both one's-complement and sign-and-magnitude have two ways to represent zero: 00000000 (+0) and 11111111 (-0) in one's-complement and 10000000 in sign-and-magnitude. This is sometimes problematic (as hardware for adding and subtracting may be more complicated, as might testing for 0).

To avoid this, and to also make integer addition simpler, the two's-complement representation is the one generally used. The two's-complement representation is created by first complementing the positive number, then adding 1 to it. Thus 00101011 (43) becomes 11010101 (-43).

In two's-complement, there is only one zero (00000000). Negating a negative number involves the same operation: complementing, then adding 1. The pattern 11111111 now represents -110 and 10000000 represents -12810; that is, the range of two's-complement integers is -12810 to +12710.

To add two two's-complement integers, treat them as unsigned numbers, add them, and ignore any potentical carry over (this is essentially the great advantage that two's-complement has other the other conventions). The result will be the correct two's-complement number, unless both summands were positive and the result is negative or both summands were negative and the result is non-negative. The latter cases are refered to as "overflow" or "wrap around"; the addition cannot be carried out in 8 bit two's-complement in these cases. For example:

     00101011 (+43)     11010101 (-43)     00101011 (+43)     10011010 (-101)
   + 11010101 (-43)   + 11100011 (-29)   + 11100011 (-29)   + 10110001 (- 79)
   - - - - - - - -    - - - - - - - -    - - - - - - - -    - - - - - - - - -
     00000000 (  0)     10111000 (-72)     00001110 (+14)     01001011 (overflow)

endian, big-endian, little-endian, network byte order

When an integer is represented with multiple bytes, the actual ordering of those bytes in memory, or the sequence in which they are transmitted over some medium, is subject to convention. This is similar to the situation in written languages, where some are written left-to-right, while others are written right-to-left.

Using a 4-byte integer, written as "ABCD", where A is the most significant byte and D is least significant byte, big-endian convention would store the number in successive memory locations as A (lowest address), then B, then C, finally D, while little-endian convention would store the bytes in D-C-B-A order.

Network byte order is, by convention, sending the bytes in the order A, then B, etc., onto the medium. It is the responsibility for the transmitting and receiving systems to convert, if necessary, to their internal endian format.

Processor families that use big-endian storage: Motorola, IBM 370

Processor families that use little-endian format: Intel 386, VAX

Processor families that use either (determined by software): MIPS, DEC Alpha, PowerPC

The PDP family of processors, which were word- rather than byte-addressable, used the unusual pattern of B-A-D-C (that is, byte-swap within words).

The term big-endian is derived from the Big-Endians of Jonathan Swift's Gulliver's Travels.

See also: Kilobyte, Megabyte, Gigabyte, Terabyte, Petabyte, Exabyte, Zettabyte, Yottabyte