2009 Mar 02
Use of Constant Numbers | Physical Dimension Confusion |
Macro Errors | Memory Management | Names | Clib Issues | Keyword Issues | Use of Casts |
Compiler Warnings | Comments | Special Strings | Debugging | White Space |
When variables which are the simple or fundamental types (e.g. int, or double) it is too easy to write a statement which has a physical dimensionality error that is invisible to the compiler. There are multiple kinds of such semantic mistakes:
temperature += distance;
size = inches + feet
velocity = time/distance;
area = length * width * height;
float length=5.0, width=3.4, height=7.2; float area, volume; float celsius, farenheight; float time1=116.3, time2=60.0; height = length + celsius; // mixed distance and temperature area = length * time2; |
The solution, which enables the compielr to enforce dimensionality checking, is to use a different struct with dimension in tag or field name, with fields of simple types.
Example:
#define calc(a,b,c) a+5*b-c int x, y, z; z = 7*calc(x,3+z,-y); Results in: z = 7*x+5*3+z--y; |
Corrected macro:#define calc(a,b,c) ((a)+5*(b)-(c)) Corrected Results: z = 7*((x)+5*(3+z)-(-y)); |
#define min(a,b) ( ((a)<(b)) ? (a) : (b) ) int x=4, y=5, z; z = min(--x, y++); Results in: z = ((--x) < (y++)) ? (--x) : (y++) )each parameter is incremented or decremented twice - not what is wanted or expected. Solution: never pass variables with ++ -- operators to a macro! |
Frequently test C program with valgrind for memory management errors. Always free up allocated memory (even if program is done, so that valgrind is happy and if code is used in a larger program there are no leaks.
Good variable names are not just descriptive and hopefully self explanatory. There are multiple issues beyond this which get more and more important for large bodies of code. C has a few distinct name spaces:
Although the compiler will not get confused by field names the same in multiple structs or unions or by function local or static function names which are the same, this is a big issues for us mere humans.
Even though a type, function or variable can be called the same if it is private (i.e. static) to a module this is a BAD practice because if an issue occurs it is helpful to grep for a unique name and not have to filter the unnecesaary. The larger issue which gives clarity and the baility to combine andresuse modules from different projects is that names in every module should have a unique (and hopefully fairly short) prefix for types, variables and functions. This also allows someone reading code to know where (i.e. in which module) that type, variable or function is defined.
Unless a type name is obvious like uint4 for unsigned int all types created by a typedef just end with suffix "_t" so it is clear to the reader that this identifier, this name, is a type.
Similarly it is essentially for every struct and union field to have a unique suffix. This not only makes it easy to use grep to find all references without extraneous noise, but it helps the programmer avoid confusion between fields. This is especially important for pointers, a common example of which is a linked list. Thus if we hat
It is best to not have anonymous tag names. Also tags must a suffix to identify union vs struct. After Combining the statements of these last paragraphs, the following is bad practice:
typedef struct foo { struct foo *next, *prev; float item; } foo; typdef struct { struct foo *next, *prev; int id; char *name; } node;
The above must be:
typedef struct foo_s { struct foo_s *foo_next, *foo_prev; float foo_item; } foo_t; typdef struct node_s { struct foo *node_foo_next, *node_foo_prev; int node_id; char *node_name; } node_t;
Example 1: Buffer Overflow |
// Naive code using strcat. // This will work provived BUF_SIZE is bigger than the sum of: // the field_name sizes + NUM_FIELDS + 1, // but may fail (i.e. overflow bug) if those sizes are unknown at // compile time and their maximum size is not known. #define NUM_FIELDS 10 #define BUF_SIZE 1024 char *field_name[NUM_FIELDS]={ "field1", "field2", ..., "field10" }; char buf[BUF_SIZE]; int first,x; buf[0] = 0; for(first=1,x=0; x<NUM_FIELDS; x++) { if(first) first = 0; else strcat(buf, ","); strcat(buf, field_name[x]); }
// Less naive code (actually seen in our code) using strncat, but incorrectly! #define NUM_FIELDS 10 #define BUF_SIZE 1024 char *field_name[NUM_FIELDS]; // field name sizes unknown at compile time char buf[BUF_SIZE]; int first,x; buf[0] = 0; for(first=1,x=0; x<NUM_FIELDS; x++) { if(first) first = 0; else strncat(buf, ",", BUF_SIZE); strncat(buf, field_name[x], BUF_SIZE); } // The above is wrong because the semantics for the third parameter to // strncat does not specify the size of the destination but the maximum // number of characters to copy from the source string: field_name[x]. // This is also sub-optimal in performance because the strncat has to // start at the beginning of the destination string, buf, each time.
// Correct and effecient code, notice the use of 'y' (offset into buf). #define NUM_FIELDS 10 #define BUF_SIZE 1024 char *field_name[NUM_FIELDS]; // field name sizes unknown at compile time char buf[BUF_SIZE]; int first,x,y,len; buf[y=0] = 0; for(first=1,x=0; x<NUM_FIELDS; x++) { if(first) first = 0; else { buf[y++] = ','; buf[y] = 0; } len = strlen(field_name[x]); // If this done more than once then // store this length in an array, // prepared in advance. assert(BUF_SIZE>y+len+2); // allows for a comma after if needed strcat(&buf[y], field_name[x]); y += len; }
Some of the C language keywords are adjectives modifying a type.
There is no reason for a C programmer to use the register modifier keyword. The compiler is far more competent in making this optimization. This modifier never has meaning for a parameter being passed to a function because that parameter is on the stack anyway.
The need for the volatile keyword can only be know to the programmer. It tells the compiler it can never optimize a variable so qualified. This occurs when the variable correspond, for example, to some memory mapped IO address and hence can change at any time. This is a hardware implementation issue unknown to the compiler.
Except for casts needed for pointers which are of type (void *) every cast is potentially an untruth which can be fatal. A common place where function parameters need to be (void *), or worse some other kind of pointer, is for callback functions which must have a fixed prototype but have multiple uses.
Every compiler warning is a potential source of error and must be fixed. Ideally a compiler switch can be set to make this so. There are other compiler flags that need to be set for tight code.
Need I mention: missing, wrong, or useless comments?
Debugging is the inevitable consequence of:
This is why the agile approach to software has been recognized as a successful approach. For agile to work well it must be flexible enough to stop developing and fix accumulated bugs.
The most important thing learned the hard way from decades of experience is that if we do not fix bugs as soon as they are discovered we will implement new functionality that will break when some of the bugs are fixed because that new functionality was made to work around those bugs. The cost of delaying bug fixes is very VERY high. Much more time is lost because the bugs multiply, and they are harder to find because many interacting bugs are hard to find and fix.
The consequences are we must:
One of the things we know is that we must have as near 100% test coverage as possible to have a stable, robust and user friendly software.
What is important is our attitude to bug fixing. It is an essential part of the satisfaction and joy of a job well done. It is puzzle solving. The frustration at having to fix bugs is the aspiration for perfection. I do not know a single programmer who is perfect and never makes a mistake. We learn by our mistakes and become better programmers. As we redesign and refactor our abilities as a programmer and designer grow qualitatively. I am still learning new things about programming which surprise me that I did not see earlier.
Without a clear sense of the big picture, the total architecture, hardware flow and software flow, oversights are going to happen, global optimizations will be missed. And we will paint ourselves into larger or smaller corner requiring redesign, refactoring, new interfaces, new modularization, and so on.
The use of TABs in source code is a bad practice because different editors and fonts will cause a variable alignment. Indenting should use a contant 4 spaces per indent.
Ideally line lengths should be less than 80 characters. This is because when printing a line will not wrap (or be trunkated). Also wrapping in a window will be well behaved.
2002-2009