(Last Mod: 27 November 2010 21:37:41 )
In most of the prior material we have assumed that the values stored in memory have been stored using a particular representation and read using that same representation. This is almost always the case. But it is possible to store values using one representation and read them using another. After all, there is nothing to distinguish the bit pattern for an integer from the bit pattern for a floating point value - they are simply patterns of 1's and 0's stored in memory.
Although sometimes we "cross represent" values intentionally, it is usually the result of an error on the programmer's part. No matter how diligent we might be, these types of errors are more common than we would prefer. People that understand number representation and how it is implemented have the ability to identify and correct these errors expeditiously, whereas people that are ignorant of these topics can literally spend hours or even days baffled by a simple mistake caused by a single wrong character in a printf() statement.
Probably the most common example of cross-representation in C involves the input and output routines. For instance, the programmer wants to get a four byte signed integer from the user but mistakenly tells the input routine to store the value entered as a single precision floating point value. The person then, on the next line, echoes the value that the user entered by invoking the output routine and has properly told it that the value is supposed to be a four-byte signed integer. Almost certainly the most common specific example of cross representation is using one of the scanf() family of function to input a floating point value and store it in a variable of type double but mistakenly identifying the variable as being of type float in the format string.
Example:
Problem: The programmer declares that a two-byte signed integer is to be stored at address 0xC300. The machine uses the Little Endian format. After prompting the user for such an integer, the user enters the value "123456". The programmer then echoes the value back out to the user. However, the programmer has mistakenly told the input routine to store the entered value as an IEEE-754 single precision floating point value. What value is reported back to the user?
Solution: The user is going to enter the value 123456 and that is going to get stored as a 4-byte floating point value with and an eight bit exponent and a twenty-three bit mantissa. Because only the very smallest values have an implied leading '0', this representation will have an implied leading '1'.
Converting this value to binary we get:
12345610 = 1E240H = 1 1110 0010 0100 0000 b
Normalizing this value we get:
1 1110 0010 0100 0000 b = 1 . 1110 0010 01 b x 216
Building up our floating point representation:
sign bit = 0
exponent = 16 + 127 = 143 = 0x8F = 1000 1111 b
mantissa = 111 0001 0010 0000 0000 0000 b
pattern = (0)(1000 1111)(111 0001 0010 0000 0000 0000)
pattern = 0100 0111 1111 0001 0010 0000 0000 0000 = 0x47F12000
The base address is 0xC300 so the four bytes are stored at locations 0xC300:0xC303
The Little Endian format means that the LSB is at the little address, so the pattern is stored as follows:
ADDRESS | C2FE | C2FF | C300 | C301 | C302 | C303 | C304 | C305 | C306 | C307 |
CONTENTS | ?? | ?? | 00 | 20 | F1 | 47 | ?? | ?? | ?? | ?? |
Note: All values in hex
When these memory locations are read for output, they will be interpreted as a two-byte signed integer starting at 0xC300 and stored in Little Endian order. The resulting integer that is displayed will therefore be:
value = 0x2000 = 819210
So how to you detect this kind of mistake? Certainly, it can be difficult. In this example, the reported value turned out to be considerably smaller than the entered value; but in practice there is virtually no predicable pattern to the observed behavior - at least not predictable until you have identified the exact error involved. But a common trait that is exhibited by cross-represented values is that changes to the program or data that should produce only small changes in the output produce wildly erratic changes in the results. This makes since when you consider that a small change in an integer might result in a large change in a floating point value because what changed was one of the bits in the exponent.
The author would like to acknowledge the following individual(s) for their contributions to this module: