Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

Integer representations

📅 2020-Apr-05 ⬩ ✍️ Ashwin Nanjappa ⬩ 📚 Archive

Full size image
Full size image

Unsigned integer data type is used to represent non-negative integers. Signed integer data type is used to represent all integers, both positive and negative.

C/C++ has both signed and unsigned integer types. Other languages like Java have only signed integer types.

Unsigned integer in almost all languages and computers have a single possible bit representation, as show in the above figure. Signed integers have had 3 implementations during the evolution of computers: two's complement, ones' complement and sign magnitude.

Two's complement is what is universally used in computers today. Ones' complement used to be used in computers until the 1960s, like the CDC-160 for example. Sign magnitude was also used in computers until the 1960s, and is only used in the IEEE 954 floating point format now.

My mental model of how bits are interpreted in these 3 implementations is depicted in the above figure using a 4-bit integer type as an example. Notice how the sign bit means different values and operations in the 3 representations.

Two's complement is a bit unintuitive for humans. But it won the competition over the other 2 reprsentations because addition, subtraction and multiplication are the simplest to implement in it and are the same for both unsigned and signed integers. For example, addition of unsigned integers and signed integers (in two's complement form) is the same bit-wise process: add the bits from LSB to MSB and ignore the final carry bit. So simple!

Two's complement, by its nature, has an uneven range. For example, 8-bit signed integer has range -128 to +127. Notice how negating -128 will not give you +128 because that value cannot be represented in a two's complement 8-bit signed integer. Computers using two's complement choose to return -128 as the negated value of -128.

Ones' complement, by its nature, has an even range. For example, 8-bit signed integer has range -127 to +127. But it will have two zeroes: +0 (when all bits are zero) and -0 (when all bits are one).

Sign magnitude, by its nature, has an even range. For example, 8-bit signed integer has range -127 to +127. But it will have two zeroes. You can see this in the IEEE 754 floating point format, which has two zeroes.

Both the C and C++ standards allow signed integers to be represented in any of these 3 formats. This is in an attempt to support old computers which might use either two's complement or sign magnitude. Some of the famous computers which used these decrepit formats include the CDC computers (CDC-160) and the PDP-1. But these computers were out of use by the time the C language was taking shape in 1972. But apparently, there are companies that sell mainframes built on Intel CPUs that emulate these old computers, so there might still be some use to supporting these formats.

In any case, the C/C++ standard support of these old formats is why the minimum ranges of signed integers in C/C++ standards have an even range. For example, the minimum range of 8-bit signed integers in C/C++ is -127 (SCHAR_MIN) to +127 (SCHAR_MAX). But implementations are allowed to have signed integer types whose range is larger, which is what all computers do today: their 8-bit signed integers have range -128 to +127 because they use two's complement.

// C99 standard defines these minimum ranges for signed integer types:
#define SCHAR_MAX +127
#define SCHAR_MIN -127   // <<<<
#define SHRT_MAX +32767
#define SHRT_MIN -32767
#define INT_MAX +32767
#define INT_MIN -32767
#define LONG_MAX +2147483647
#define LONG_MIN -2147483647

// Implementation-defined ranges found in limits.h on my Intel x64:
#define SCHAR_MAX +127
#define SCHAR_MIN -128  // <<<<
#define SHRT_MAX +32767
#define SHRT_MIN -32768
#define INT_MAX +2147483647
#define INT_MIN -2147483648

References: