Binary Representations Home Page
(Last Mod: 15 January 2012 13:49:31 )
A “memory
dump” is simply a listing of the contents of a section of a computer’s
memory (or possibly the entire contents). They are used primarily for
debugging purposes since they allow you to see exactly what is contained at
each memory location in the dump. There are a few ways to format a memory
dump. For our purposes, we will use the following format:
ADDR:ADDR
0011223344556677 8899AABBCCDDEEFF
In this
format, each row displays 16 bytes of memory starting at the address on the
left. To aid in identification, they are presented as two groups of eight
bytes.
The address is presented as a 32-bit value written in
hex (hexadecimal) in two-byte chunks separated by a colon. The colon is only
present as a visual aid, the address value is all 32 bits combined. If a
particular dump can be represented using only 16-bit addresses, then only
the two least-significant bytes may be shown (and thus, no colon). Note that
address values are always represented as unsigned integers with the
most-significant byte on the left. The notion of “endianness” does not apply
because addresses are not, themselves, values stored in the memory being
dumped.
Since
each row contains sixteen bytes of data, each line begins on an address
boundary that is divisible by sixteen, Thus, the least significant hex
digit is always zero. The values going across the top of the listing (the
header row) are a visual aid to help locate individual bytes. Since each
byte requires two hexadecimal digits to represent it, the least significant
digit of the address, called the offset, appears as two repeated characters
directly above the byte values stored at that offset. The address of a
particular byte is the sum of the address at the left of the row and the
offset at the top of the column the byte is in.
Bytes themselves are always displayed as if they were
straight, unsigned binary values. Again, endianness does not apply because
it makes no sense to talk about “byte order” when considering an individual
byte.
Assume
that the
value 0x42 were stored at memory address 0x13B734EA (recall that a prefix of
‘0x’ is one of the standard ways of denoting a hexadecimal number, with
perhaps the other most common way to use a suffix of ‘H’ or ‘h’). This
information would be found in the memory dump by first looking down the
address list until row 0x13B734E0 (set the last hex digit to zero) is found
and then looking across the row and choosing the byte (two hex digits) below
the ‘AA’ in the header row, as shown by the underlined byte below.
ADDR:ADDR 0011223344556677 8899AABBCCDDEEFF
...
13B7:34D0 59415B3E4A878F6A 10D0E3AF1D1A9A8F
13B7:34E0 C6765A2BE85F6A0F 45A8
42
3B1C4A3EF5
13B7:34F0 387A5D3E26F12C3B 358D5E5F66F9A745
13B7:3500 0B14D687C9A87E56 54F6D5631024B540
Now assume that there is a 2-byte value that is stored at memory location 0x13B734F2. By nearly universal convention, the address of a multibyte value is the lowest memory address of any of the bytes involved, frequently known as the "base address". The rest of the bytes are then stored at consecutive higher memory addresses. Thus the bytes associated with this 2-byte value are underlined below.
ADDR:ADDR 0011223344556677 8899AABBCCDDEEFF
...
13B7:34D0 59415B3E4A878F6A 10D0E3AF1D1A9A8F
13B7:34E0 C6765A2BE85F6A0F 45A8423B1C4A3EF5
13B7:34F0 387A5D3E26F12C3B 358D5E5F66F9A745
13B7:3500 0B14D687C9A87E56 54F6D5631024B540
So the two bytes that make up this value are 0x5D and 0x3E.
Arrays are nothing more than collections of objects, all of which are of the same data type. They are stored in order, beginning at the base address of the first element. Locating a particular element in an array therefore requires knowing the base address of the first element and how many bytes are occupied by each element. As an example, consider that the value in the prior example is actually the first element in an array of two-byte values. What bytes make up the seventh element?
We can either simply count up from the base element, going two bytes at a time, or we can calculate the base address of the seventh element. In calculating the address, however, we must take care to avoid making a "fence post" error (also known as an "off-by-one" error). If we want the seventh element, then we are really wanting to find the sixth element after the first element. So we take the address of the first element and add to it six times the number of bytes for each element (two, in this case). The result, using either method, is that we want the two-byte value stored at location 0x13B734FE, shown below.
ADDR:ADDR 0011223344556677 8899AABBCCDDEEFF
...
13B7:34D0 59415B3E4A878F6A 10D0E3AF1D1A9A8F
13B7:34E0 C6765A2BE85F6A0F 45A8423B1C4A3EF5
13B7:34F0 387A5D3E26F12C3B 358D5E5F66F9A745
13B7:3500 0B14D687C9A87E56 54F6D5631024B540
So the two bytes that make up this value are 0xA7 and 0x45.
Notice that, up to this point, nothing has been said regarding how the bytes at the various address in a multibyte value are combined to determine the value represented. The first step in accomplishing that task is to consider the endianness with which they were stored. The two dominant ways of storing multibyte values in memory are known as "big endian" and "little endian". In a big endian format, the most-significant byte (MSB) is stored at the base address. This can be thought of as putting the "big end" of the number at the base address. Conversely, if the value is stored in little endian format, the little end of the value -- the least-significant byte (LSB) -- is stored at the base address. In either case, the rest of the bytes are stored in order in successively higher memory locations, eventually ending with the "other end" of the value.
Thus, if the value from two examples prior (the two-byte value stored at 0x13B734F2) was stored in big endian, it would be 0x5D3E, while if it were stored in little endian it would be 0x3E5D. Similarly, if the value from the previous example (the seventh element of the array of two-byte values starting at that same address) were stored big endian, it would be 0xA745 and, in little endian, 0x45A7.
As another example, consider the 4-byte value stored at location 0x13B734E4. The bytes involved are underlined below.
ADDR:ADDR 0011223344556677 8899AABBCCDDEEFF
...
13B7:34D0 59415B3E4A878F6A 10D0E3AF1D1A9A8F
13B7:34E0 C6765A2BE85F6A0F
45A8423B1C4A3EF5
13B7:34F0 387A5D3E26F12C3B 358D5E5F66F9A745
13B7:3500 0B14D687C9A87E56 54F6D5631024B540
If this value is stored in big endian, then the value is represented by 0xE85F6A0F. But if it is stored in little endian, it is 0x0F6A5FE8.
Take particular note of the fact that it is the bytes that are ordered one way or the other. Not the bits and not the nibbles (recall that a nibble is 4-bits and, therefore, that each hex digit represents a nibble). Furthermore, it is the order of bytes within each multibyte value that is affected, not the order of the elements of an array. Each element of an array is a distinct and separate value.
There is no widespread agreement as to which flavor of endianness is better, despite the fact that ardent proponents of each have been engaged in spirited and endless debates approaching the intensity of a religious war for decades. Each has advantages and disadvantages relative to the other. Tasks that can be accomplished easily by starting at the MSB, such as magnitude comparisons, are more efficiently performed using big endian, while tasks that are more readily performed starting at the LSB, such as many arithmetic operations, are more efficiently performed using little endian. Since computers are called on to perform both types of operations, neither format has a clear advantage.
Up to this point we have been careful not to determine what the actual value (such as two hundred thirty five) of any of the values we have been identifying. This is because everything discussed up to the point only deals with identifying what bytes, located in memory, are part of a value and making sure that we can put them in the right order and is independent of how to interpret those bytes. Once we have them assembled in the correct order, we can begin treating them according to the number representation used for each one, such as unsigned binary, two's complement, or floating point, or whatever.
For the next two examples, let's consider some simple questions that can be answered without converting values from their binary representation to decimal first; instead, we will use our understanding of the representations and how values are stored in memory to answer them directly.
Consider two values, X and Y, which are both 4-byte values stored at locations 0x13B734D8 and 0x13B734E4, respectively. The two values are underlined below.
ADDR:ADDR 0011223344556677 8899AABBCCDDEEFF
...
13B7:34D0 59415B3E4A878F6A 10D0E3AF1D1A9A8F
13B7:34E0 C6765A2BE85F6A0F
45A8423B1C4A3EF5
13B7:34F0 387A5D3E26F12C3B 358D5E5F66F9A745
13B7:3500 0B14D687C9A87E56 54F6D5631024B540
Which value is the larger of the two?
We can't answer that question without knowing both what the endianness and what the representation used is. But what we can say is that what will matter is the MSB of each value (and, if the MSBs are identical, then the next most-significant byte until we find one in which the two values differ). If the values are stored in big endian, then the MSBs for X and Y are 0x10 and 0xE5, respectively. If they are stored in little endian, then the MSBs are 0xAF and 0x0F, respectively.
If big endian is used, then the bytes that matter for have values 0x10 (X) and 0xE5 (Y). If these are represented as unsigned integers, then Y is the largest since 0x10<0xE5. If they are represented in two's complement, then X is larger because X is positive and Y is negative. Remember, in two's complement a value is negative if the most-significant bit is a 1 and, if this is the case, then byte values 0x80 and higher represent negative numbers.
If little endian is used, then the bytes that matter for have values 0xAF (X) and 0x0F (Y). If these are represented as unsigned integers, then X is the largest since 0xAF>0F. If they are represented in two's complement, then Y is larger because Y is positive while X is negative.
Let's ask another question about these same values, namely, are they odd or even?
In this case, it is the least-significant byte (LSB) that counts. More specifically, if the value is represented using straight binary or two's complement, then the value is even if the LSB is even, regardless of whether it is positive or negative. Thus, if the values are stored big endian, X is even while Y is odd. On the other hand, if stored little endian, both are odd.
Be careful not to draw the conclusion that the value in any representation is even if the LSB is even. As two counter examples, in one's complement, negative values that are even have LSB's that are odd. For a fixed point (or floating point) value, the value may not even be an integer, in which the entire question of even and odd is, at the very least, more complex, if not entirely meaningless.
Let's store the value -1,000,000 in the 4-byte signed integer stored at location 13B734D4. Assume that signed integers are stored in two's complement and that multibyte values are stored in memory in little endian.
First, let's represent 1,000,000 as an unsigned integer in hex.
Using the method of repeated division, we have:
16 | 1000000
62500 r 4
3906 r 4
244
r 2
15 r 4
0 r 15
Therefore the value is 0xF4244. Being a bit more explicit and taking into account that it is a 4-byte representation, we have 0x000F4244
The next step is to negate this value by taking the one's complement and then adding one. In hex, taking the one's complement is the same as subtracting the value from all F's. Hence
FFFFFFFF
-000F4244
----------
FFF0CECC
+ 1
----------
FFF0CECD
Our final step is to place this in memory (overwriting whatever is there presently), starting with the LSB at the base address. The new values are underlined below.
ADDR:ADDR 0011223344556677 8899AABBCCDDEEFF
...
13B7:34D0 59415B3ECDCEF0FF 10D0E3AF1D1A9A8F
13B7:34E0 C6765A2BE85F6A0F 45A8423B1C4A3EF5
13B7:34F0 387A5D3E26F12C3B 358D5E5F66F9A745
13B7:3500 0B14D687C9A87E56 54F6D5631024B540
When working with values stored in memory, do not try to do everything all at once. Take things one step at a time. When figuring out what value is stored in memory:
Make sure you know how many bytes of data are involved.
Determine the address of the relevant data.
Look up the values of the bytes.
Take the endianness of the storage into account to assemble the bytes into a representation
Use the rules of the representation to extract the value.
Storing a value in memory is mostly the reverse of this: