(Last Mod: 27 November 2010 21:38:44 )
Being able to represent individual characters in specified memory locations is very useful, but it is not very convenient for the way we normally want to work with text information. Typically, when we work with text, we work with "strings" of characters. The obvious way to represent a string in memory is as a sequence of ASCII codes - and this is exactly what is done.
But there is a subtlety that we must deal with - how do we know where the string ends? We have a few options: We could specify a fixed length for all strings and pad any unused portions at the end with a code that tells us that it is not used; we could keep track of both the address where the string starts and how long the string is; or we can keep track of where the string starts and embed a "delimiter" in the string data itself to mark the end of the string. In some languages, such as Pascal, the first byte of a string is not interpreted as an ASCII code, instead it is interpreted as the number of characters in the string. In other languages, including C, the a particular code is placed after the last actual character in the string. In C, this character is the NUL character (ASCII code 0x00) and, therefore, we refer to C as working with "null terminated strings". Note that not all languages that use a terminating character use the NUL character.
Just as the ASCII code for an individual character is expressed in C by surrounding the character by single quotes, a string of ASCII codes, including the terminator, is expressed in C by surrounding the string of characters by double quotes. The NUL terminator does not need to be expressly included in the string - the compiler will add it automatically.
Example: The string "ECE-1021 Module #1" is stored at location 0xDC08. Draw the relevant memory map.
0 1 2 3 4 5 6 7 8 9 A B C D E F 0xDC00 ?? ?? ?? ?? ?? ?? ?? ?? 'E' 'C' 'E' '-' '1' '0' '2' '1' 0xDD00 SP 'M' 'o' 'd' 'u' 'l' 'e' SP '#' '1' NUL ?? ?? ?? ?? ?? The above uses the characters to represent the memory, and this is generally the quickest and most useful way of representing the codes. For completeness, the actual values (in hex) are shown below.
in hex 0 1 2 3 4 5 6 7 8 9 A B C D E F 0xDC00 ?? ?? ?? ?? ?? ?? ?? ?? 45 43 45 2D 31 30 32 31 0xDD00 20 4D 6F 64 75 6C 65 20 23 31 00 ?? ?? ?? ?? ?? 
It's one thing to say that a particular string is stored at a particular memory location - it is quite another to then actually be able to do something with that string.
Let's say that we wanted to print out the first and last characters in the string in previous example. In words, what we want to accomplish is very simple:
Problem: A null-terminated string is stored at memory location 0xDC08. Print out the first and the last character of the that string.
Solution:
Putting this into our more structured pseudocode:
Implementing this is C code, we would have:
char *ptr;
char c;
ptr = "First String";
c = *ptr;
if ( '\0' != c )
{
PutC(c);
ptr++;
c = *ptr;
while ( '\0' != c )
{
ptr++;
c = *ptr;
}
c = *(ptr - 1);
PutC(c);
}
The above code can be condensed quite a bit, but before doing so, let's make sure we fully understand the new elements.
First, there's the variable declaration:
char *ptr;
The
  variable ptr is known as a "pointer" variable
  because the value stored in it is intended to be a memory address where some
  other piece of information is stored, hence it "points" to that
  other piece of data. When we declare a pointer variable, we indicated that it
  is a pointer by preceding it with an asterisk. 
But it is not sufficient to just know the address where a piece of data is stored - we must also know such things as how many bytes the data item consists of, how it is ordered in memory, and how to interpret the pattern of bits stored in those bytes in order to recover the information stored there. If the compiler knows the data type of the object stored at the location being pointed to, then it has all three of these pieces of information. Hence when we declare a pointer variable, we must also indicated the data type that it is pointing to.
If asked what the data type of ptr is, the most correct response would be that it is a "pointer to an object of type char". This is frequently stated in shorter terms by simply calling it a "pointer to char" or a "char pointer".
Second, there's the line:
ptr = "First String";
The string literal is stored someplace in memory - very possibly as part of the program code itself. At compile time, a string literal using in an expression evaluates to the address at which that string is stored. It's very important that we recognize that, while we can access string literals in the same way as any other string, that we cannot write to or in any way attempt to modify a string literal. We may get unlucky and be able to do it and obtain the desired results, but we have invoked undefined behavior and the next time we compile and run the program we may end up crashing the system.
Next, there's the line:
c = *ptr;
When
  an asterisk is used as a unary operator - i.e., an operator with only one
  operand - it can't be the multiplication operator since that requires two
  operands. In this situation, it is interpreted as being the "dereference"
  operator, also known as the "indirection" operator. The value stored
  in ptr is known as a "reference" to the object
  being pointed to - hence the term "dereference". Similarly, the
  variable ptr only indirectly tells us the information we
  are looking for - hence the term "indirection".
One useful way to read the dereference operator is to think of it as saying, "the value stored at". Combined with thinking of a variable name as saying, "the value stored in the variable", this makes statement such as the above quite understandable. It reads.
(The
  value stored in the variable c) (is set equal to) (the
  value stored at) (the value stored in the variable ptr)
Fourth, there's the line:
p++;
This is the same increment operator that we've used extensively already. But there is a subtlety when performing arithmetic involving pointers that is easy to overlook. Since the compiler knows what type of data the pointer is pointing to, and since pointers are expected to point to objects, the result of adding an integer to a pointer value should yield a pointer value that points to an object.
For
  instance, let's say that ptr is equal to 82. If we increment this value like
  we would probably be tempted to we would get a value of 83. This is fine if
  the object being pointed to is only one byte wide because then we could have a
  different object of that same data type at location 83. But what if the data
  type being pointed to is four bytes wide? Now memory location 83 does not
  point to an object since an object's address is the lowest numbered memory
  location of the block of memory containing the object. The next higher value
  that could possibly serve as a pointer to an object is 86. Hence the
  expression ptr+1 would yield a value of 86, and not 83.
  This is not too counterintuitive as long as you think of the integer in the
  expression being the number of objects to be added to the pointer and not as
  the number of bytes.
Although
  there is nothing that guarantees that the next object in memory is of the same
  type as the one originally pointed to by ptr, when we use
  pointer arithmetic the compiler makes two assumptions and leaves it up to us
  to write our code so that the assumptions prove to be valid. The compiler
  assumes that the object pointed to by the new value is the same as the object
  pointed to by the original value and it further assumes that the memory
  between the old value and the new value consists entirely of closely-packed
  objects of that same data type. 
Hence
  the value ptr+20 would yield a pointer value of 86+20*4 or
  166 while the value ptr-10 would yield a value of 66.
At
  this point it will hopefully not come as a surprise that, if ptr1
  and ptr2 point point to values of a data type that is
  eight bytes wide, that if ptr1 is equal to 2064 and ptr2
  is equal to 2016, that the expression ptr1-ptr2 is equal
  to 6 and not 48. Remember that an integer added to a pointer value represents
  the number of objects of that data type to be skipped. The same is true for
  subtracting two pointers - the integer that results does not represent the
  number of bytes between the two addresses, but the number of objects of that
  data type between the two addresses.
While it may not be obvious at first glance, the addition of two pointer values is undefined. In terms of objects of a particular data type, the sum of two addresses is completely devoid of meaning.
Finally, let's consider the line:
c = *(ptr - 1);
Everything needed to interpret this line has already been discussed, but this is a good time to point out one of the common mistakes that programmers make when working with pointers and performing pointer arithmetic - namely that the dereference operator has higher priority than the arithmetic operators, hence if we had written:
c = *ptr-1;
The
  result would have been perfectly legal code that would have compiled and
  executed and done something we hadn't wanted it to do - it would have gone to
  the address pointed to by ptr, retrieved a value of type
  char from there, subtracted one from that value, and stored the result in the
  variable c. What we wanted it to do was to calculate the
  address of the object of type char immediately preceding the one presently
  pointed to by ptr, retrieve the value from that location,
  and store the result in the variable c - hence the
  parentheses are essential to getting the code to behave as intended.
Now let's write the above code in the mode condensed format alluded to previously:
char *ptr;
ptr = "First String";
if ( '\0' != *ptr )
{
PutC(*ptr++);
while ( '\0' != *ptr )
ptr++;
PutC(*(ptr-1));
}
The author would like to acknowledge the following individual(s) for their contributions to this module: