Pointers

A Pointer is a data item whose value is an address. This address can point to any of the possible data types - including another pointer. A data item which is a pointer has a value and an address, and they are almost always different. If they were the same, the pointer would be pointing to itself which not very useful. A pointer is also an operand of many of the computer's instructions.

Pointers give, what is called, an indirect usage of the data item referenced or pointed to by the pointer. Thus the data item referenced by a pointer is used indirectly. A pointer is also called an 'alias' for what it is (currently) pointing to. This is because the name of the pointer is an alias (when dereferenced by the * operator) for the data item pointed to by the pointer.

Use of pointers is called indirection because first the memory must be read at the address where the pointer is stored. The value stored in the pointer is another address which then must be used to refer to the object indirectly referenced (read or written) by the pointer.

A good analogy for indirection is the following. You want to know where a book is kept in a library so first you look it up in the card catalog where you find a card (which is an indirect reference) to where the book is. You do not look on the bookshelves by the title of the book. You must first look up the name of the book (the name of the pointer) to find its address (book number).

Another example is the address where you live. If this address is written on a piece of paper, that piece of paper is an indirect reference to your house. The address is not your house, it is only a pointer to your house.

With the use of pointers, it is possible to have 2 or more ways to refer to the same data item - e.g. the data item itself by name, and one or more pointers pointing to that same data item. Pointers have tremendous flexibility in the 'C' language. They also give the opportunity for errors which are hard to find.

A pointer is declared with the unary asterisk operator, *, as in the following example:

	int v, *ip, z;

where (in words): ip is a pointer to an integer.

ip can be set to point to the variable v by using the unary address of operator, &, as follows:
	ip = &v;

The unary asterisk operator, when used with a pointer in an expression, takes on a slightly different meaning - for example:

	*ip = 5;

The unary asterisk in an expression is also called the indirection operator. Its meaning is: the thing pointed to by the pointer. With ip set to the address of v, the meaning of the above statement would be to set the value of the variable v to 5. The meaning of the following statement:

	z = *ip + 2;

would be to assign z the value of the data item pointed to by ip (which is the value of v) plus 2.

These two unary operators are mutual inverses. Thus:

	&*ip is the same as ip

in words: the address of the thing pointer to by ip. And

	*&v is the same as v

in words: the thing pointed to by the address of v.

The expression:

	*v

would cause a compiler error because it would be saying to evaluate the data 'v' is pointing to, but 'v'is not a pointer.

Ordinarily, when a pointer is declared one usually specifies the type of data it points too. There are two special cases for the types of pointers which go beyond the ordinary. The first is a pointer to void, and the second is a pointer to a function.

A pointer to void, declared as in:

	void *ptr, *pv;

means that the programmer is not actually saying what type the thing pointed to is (although the compiler will assume it is some data type rather than a pointer to a function). This is quite different from the meaning of the type void, when used as a function return type, which means it returns nothing!.

The only operation allowed on pointers to void is assignment:

     pv = ptr;

To do any other kind of operation using a pointer to void it must be cast (see Operators) into a pointer of a specific type, as in the following:

	void *p;
	int v, w;
	p = (void *) &v;
	w = * (int *)p + 6; /* same as: w=v+6; */

This is because otherwise the compiler will not know the size of the data item being referred to. A cast is a unary operator consisting of a type within parenthesis. Its meaning is to convert the object on its right to that type. This is safe provided that all pointers are of the same size no matter what it points to.

The second special kind is: pointer to function. When a function exists, either because it is a library function or has been defined as part of your program, the use of the identifier which is the function's name, e.g. sqrt or printf, has the meaning of pointer to function. This not really a pointer to data at all - it is a pointer to instructions!

When a pointer to a function is declared, as in the following example:

	int (*fun)(void);

which in words is: 'fun' is a pointer to a function taking no parameters and returning a value of type int.

The first set of parenthesis are necessary because:

	int *fun(void);

would mean: 'fun' is a pointer to a function taking no parameters returning a pointer to "int" - which is very different.

The use of a pointer to a function is identical to its declaration. Thus, if 'v' is an int, one could call the function declared above (assuming it had been set to point to some function) by:

	v = (*fun)();

The second set of parenthesis are needed to make it clear that it is a function call. The first pair of parenthesis are being used only to reorder the precedence of evaluation. In the final version of ANSI 'C', it was made syntactically legal to also write:

	v = fun();

This change was made to make consistent the use of library or mopdule calls in both forms which were syntactically correct before:

	sqrt(2.0)      vs.     (*sqrt)(2.0)

A pointer to a function is one of the most powerful features of the 'C' language, and is not found in many languages. Any feature, like this, which gives you the same flexibility as machine language, also opens the possibility of new kinds of errors. Besides the obvious errors of somehow corrupting the value of a pointer to function to point to data or anywhere other then the beginning of a function, there is one subtle error which can occur in PCs because it is possible for data pointers and function pointers to be of different sizes. This occurs in, what is called, the different memory models. This is why it is not always safe to use a pointer to void to point to a function. It depends on the memory model whether it can be done. Recent compilers usually take care of this.

The name of an existing function, rather than a data variable declared to be a pointer to a function, has the type of pointer to function. Thus sqrt(2.0) is an expression of type double, as that is what the function returns, but sqrt is of type: pointer to a function taking one parameter of type double and returning a value of type double.

The size of a pointer is machine dependent. In most machines, the size of a pointer is usually fixed no matter what type of item the pointer is pointing to. In some machines, such the the PC family, there are different memory models with different pointer sizes. As was mentioned above, this can be the source of very subtle errors because a part of an address can be either truncated or extended to take random data nearby in memory.

In either case, it is now understood that it is not good programming practice to mix up int's and pointers. Compilers now help in this by giving warnings or errors when such a confusion is made. If the intent is valid then a cast (see below) can be used to eliminate the warning. The real solution is just not to do it, it is not at all needed. The alternatives are better, more reliable, understandable and maintainable programs.

Pointer Arithmetic

Pointers can be used in expressions in a limited way. Besides the operations of assignment =, indirection * and address of & (see: Pointers and Arrays), a pointer can be used with restricted arithmetic expressions. Pointers can be compared: (p1==p2), (p1!=p2).

An int can be added to a pointer (or a pointer to an int). Similarly, an int can be subtracted from a pointer. In both cases, the int is implicitly multiplied by the size of the type of object pointed to by the pointer (see: Pointers and Arrays).

Lastly, two pointers may be subtracted. The result is an int, also implicitly scaled (divided) by the size of the type of the data item pointed to. This is meaningless and an error for pointers to function, and is meaningless for pointers which do not both point to within the same data object.

The integer constant 0 is automatically (and without issuance of a warning) converted into a pointer of any type. This is called the NULL pointer (note 2 LL's vs 1 L for NUL byte). Similarly, a NULL pointer will always compare equal to 0 (i.e. it is logically false). The NULL pointer is supposed to be guaranteed not to point to any object.

In the PC this is almost true. Further, if the NULL pointer is indexed and then written as in:

	char *p=0; /* pointer to char initialized to NULL */
	int  z;
	...
	p[z]=...

the likelihood of trashing your program, the operating system, or at best of crashing your program (fatal exit) is fairly high!


© 1991-2008 Prem Sobel. All Rights Reserved.