(Last Mod: 27 November 2010 21:38:39 )
The pre-processor statement #define creates a "macro". Macros range from the very simple to the very complex, but fall into one of two categories - object-like and function-like. The formats are, respectively:
#define macro_name macro_body
#define macro_name(identifier-list) macro_body
A macro is nothing more than an instruction that performs a global search and replace operation on the remaining text in your source code file before the code is actually compiled. The file is searched for every occurrence of the macro name and, wherever found, is replaced with the contents of the macro body. This process is called "macro expansion".
There are places that macros are not expanded. If the instance of the macro occurs within a comment, a character constant, or a string literal it is ignored. Also, a macro that is used within the body of the macro that defines it will not be expanded as this would result in an infinite series of expansions.
The macro name must follow the same rules as identifiers and function names - specifically, it can consist of alphanumeric characters plus the underscore and the first character cannot be a decimal digit.
The macro body consists of all characters starting with the first non-whitespace character after the macro-name (and identifier list, if present) and ends with the last non-whitespace character prior to the newline character.
Comments can appear on the same line as macros and are not included as part of the macro - recall that all comments are logically removed from the file before any further processing is performed.
Also known as "non-parameterized macros", these are used extensively to create "symbolic constants". These constants are used primarily for one of two reasons - to make code more readable and understandable, or to collect the effects of values that might change from one compilation to the next in one place.
Examples of the first type of usage might be:
#define PI (3.141592653589793238)
#define FALSE (0)
#define TRUE (!FALSE)
#define BLANKLINE putc('\n', stdout)
#define MYHEADER_DOT_H
The first definition associates the text "PI" with the decimal representation for π out to 19 significant figures - enough to match the level of accuracy of a long double type variable in TurboC v4.5. With this macro defined, we can now use the letters PI any place we need the value of π. When the macro expands at the beginning of the compilation phase, those two letters will simply be replaced with the string of characters that make up the macro body. This not only makes the code much more readable, but it protects us from the inevitable typing mistakes we would occasionally make were we to type this value in each time we needed it. Imagine being the person having to track down why a program is producing subtly wrong results because the programmer, in one line somewhere in the code, mistyped the fifth digit in the value of π. Another advantage of defining symbolic constants this way is that we don't have to know the actual value in order to write the code - we can start out defining PI to be 3.14 and then, later, we can look up the value to the level of accuracy we need and only have to update that one, easy to find line of code.
The second definition allows us to use the word "FALSE" in places where we are setting a logical value to False. In C, a logical value is False if it compares equal to zero and is True under all other conditions. By using the term "FALSE" in those situations, neither we nor future readers of our code need to concern themselves with how a logical False is represented (once convinced that it is correct on this one line).
The third definition serves the same purpose as the previous one except for the case of a logical True. We could have defined it to be (1) but instead opted to define it in terms of a previous macro definition. This is possible because the preprocessor keeps making additional passes through the code until all macros have been expanded - this allows macros to contain macros.
Defining "TRUE" in this fashion ensures that the particular non-zero value associated with our term "TRUE" is going to match the values obtained when any logical expressions are evaluated, but this is really not necessary since the standard specifically states that such expressions yield a value of 1 if True. More than anything, the terminology used in the #defines for TRUE and FALSE is a constant reminder that a False must be exactly equal to zero while a True is defined as anything that is Not False. The same point could be made, perhaps even more forcefully, by defining TRUE to be (!0).
The fourth definition highlights the point that an object-like macro does not have to be merely a symbolic constant involving just numeric values. Because they generally do involve expressions we perceive as being numbers, it is easy for us to forget that macros are simply a text replacement tool. In this case, anytime we want to advance to the next line on the display we can use the word BLANKLINE in our code. Notice that the ending semi-colon was not included as part of the macro definition. As a result, the semi-colon will have to be included after the macro name every time it is invoked. This is merely a convention that forces a more consistent coding appearance.
The fifth definition shows an example of a macro name that has no macro body. This is perfectly legal. If this macro were to be used elsewhere in the code, it would still be expanded and replaced - it would simply be replaced by nothing. Macros without bodies, however, are generally not defined with this in mind - they are defined purely so that the name itself has a definition. The reason is to that other directives can modify the code based on whether or not that particular name has been defined.
Examples of the second type of usage might be:
#define FREQUENCY (300.0)
#define AMPLITUDE (1.0)
#define SAMPLES (1000)
A part of our program might generate a certain number of sample points from a sinusoidal waveform. We may want to change the frequency and amplitude of that waveform from one compilation to the next and, at least during the initial part of our code development, might key those values to a pair of symbolic constants. Later we can then easily key them to values entered by the User if that is our goal.
Keying the number of samples to a symbolic constant has even greater utility and value - the amount of memory we need to allocate to hold those values might depend on how many values there are as might the number of times we execute a loop or how much we change a variable by at each sample point. If all of those things are not consistent we can get anything from subtle and hard to find errors in the output to catastrophic system failure. By deriving all parameters that depend on the number of samples from a single symbolic constant, we drastically reduce the likelihood of any of these things happening should we later decide to make a change in that number.
These are also known as "parameterized macros" While object-like macros are extremely useful, it is frequently the case that what we need is something that is almost, but not quite, the same every time it is used. Function-like macros essentially allow us to provide a list of changes that we want made to the macro body before each expansion - from a practical standpoint, it effectively performs two stages of text replacement.
Below is a two typical calls to the function putc().:
putc('H', stdout);
putc('e', stdout);
Notice that these two calls are nearly identical. In particular, the second argument is always the same and is not only inconvenient to type each time we use this function to write to the monitor screen, but it detracts a little bit of attention away from why the function was called each time. We could define a macro to hide this second parameter as long as we have a way to modify each occurrence in such a way that the first argument is preserved intact. A function-like macro permits this very handily:
#define PutC(c) putc(c, stdout)
In addition to the macro name, we now include a list of identifiers surrounded by parentheses - if there are multiple identifiers, we separate them with commas. When we use the macro PutC(), we will supply it with one argument for each identifier in the list. These arguments - which are still treated as just a sequence of characters - are then substituted into the macro body every place that the corresponding identifier found. After this is done, the modified body is inserted into the code in place of the original occurrence of the macro name and its identifier list.
Conceptually, you can think of the following steps being performed:
PutC('H');
#define TEMP putc('H', stdout)
TEMP;
putc('H', stdout);
It is also possible to undefine a macro name by using the following directive:
#undef macro_name
This is generally done to influence later directives that base decisions upon whether a given macro name is currently defined or not.
Consider the following function-like macro definition:
#define linear(x,m,b) m*x + b
The idea is that we have a macro that allows use to retrieve the y-value associated with a given x-value on a line with a certain slope and y-intercept. Everything looks fine and if we use the macro as the programmer probably envisioned it we will be okay. Specifically, the macro was expected to be used in the following way:
y = linear(2,5,3);
Here we are using numeric constants just to make evaluation simple. The value that we would expect to have stored into the variable y would, of course, be 13. But what if we wrote instead:
y = linear(1+1,5,3);
While we would still expect to get a value of 13, we would actually get a value of 9. Why?
The reason is that the macro will expand to the following line - just as we told it to:
y = 1+1*5 + 3;
Since multiplication takes precedence over addition, the result will be 9. Clearly we wanted, and expected, each of the arguments to the macro to be fully evaluated before they were substituted into the function. We can force this to be the case by surrounding every occurrence of every parameter in the macro body with parentheses since these have the highest precedence:
#define linear(x,m,b) (m)*(x) + (b)
Now we can use arbitrarily complex expressions for each argument without concern:
y = linear(5-3,(21-6)/(10-7),-5+8);
Will become:
y = (5-3)*((21-6)/(10-7)) + (-5+8);
Which will evaluate properly and still yield the expected value of 13.
But now what happens if we include a call to the macro within another expression, such as:
y = 2 * linear(2,5,3);
We would expect it to yield a value of 26 but it will only yield a value of 23. Why?
Looking again at the expanded macro we see that it is doing exactly what we told it to:
y = 2 * (2)*(5) + (3);
Although each argument is guaranteed to be fully evaluated before it is used within the macro body, we have not ensured that the macro body is fully evaluated before it is used in a larger expression. We can force this to occur by surrounding the entire macro body with a pair of parentheses.
Our final macro definition thus looks like:
#define linear(x,m,b) ((m)*(x) + (b))
And illustrates the two "golden rules" of writing function-like macros:
The purpose is to ensure complete and proper evaluation of every macro argument and the macro as a whole. It's true that not every function-like macro needs these steps in order to work properly. A case in point is out PutC() macro which doesn't need either level of protection. But instead of picking and choosing when to apply these rules, simply apply them always - otherwise you will, sooner rather than later, make a mistake and and ignore one of these rules where it really mattered.
In fact, some programmers make a practice of surrounding all macro definitions, including object-like macros, with parentheses unless there is a specific reason not to. The Style Standards for this course impose this as a requirement.
Consider the following macro:
#define cube(x) ((x)*(x)*(x))
Even though we have applied both of the rules mentioned above, we can still get into trouble if our arguments have any side effects. Remember that a side effect is anything that happens when an expression is evaluated beyond determining a value for the expression. The most common side effect is that associated with the assignment operators which, by design, change the value of one of the operands. Consider the following invocation:
y = cube(x+=0.1);
The intent is pretty clear - we want to take the present value of x and store the cube of that value in to y. We then want x increased by 0.1, probably because this line of code is within the body of a loop that is walking through a range of values of x. Were cube() implemented as a function, it would behave as expected because the argument would be evaluated a single time and the resulting value would be passed to the function. But in this case, the macro will expand to:
y = ((x+=0.1)*(x+=0.1)*(x+=0.1));
Because the argument is evaluated more than once, the side effect of increasing the value of x happens more than once. The value that each expression evaluates to is unknown because the exact time at which the value of x changes is unspecified by the language standard.
Writing a macro body such that it does not evaluate any of its arguments more than once is not always easy or even possible. If you wrote the macros that you use, you have knowledge of which macros are safe to use with arguments having side effects and which aren't - and it is a good idea to flag those that are not. If you find yourself using macros written by someone else, it is a good idea to avoid using arguments having side effects unless you verify that they are safe in this regard. Don't assume that the programmer writing the macro even considered the issue.
Although many of the "functions" in the standard libraries are actually implemented as macros - including putc() - they are all written so that, with one exception, their arguments are never evaluated multiple times. That exception is when an argument is a stream pointer argument that is passed to many of the I/O functions. It is very seldom that any program would have reason to use an expression for this argument that has side effects - for situations that do, function versions of these macros are available. For instance, the function version of putc() is fputc().
One of the chief reasons that function-like macros are used instead of actual functions is that they are generally more efficient in terms of execution speed and memory usage. This is because there is overhead associated with every invocation of a function in terms of both time and memory. By using a macro, particularly in a loop that is executed thousands, perhaps millions, or even billions of times, even a tiny increase in performance can reduce execution time by a significant amount.
But this is not without potential penalty. One of the reasons that we use functions is to have a single block of code that gets called from many different places within the program. By not duplicating that code at every location, we can considerably reduce the size of our program. Macros do not permit this because they are fully expanded into the body of the code at each and every occurrence.