Good and Bad Practices in C Code

2009 Mar 02

Use of Constant Numbers

Physical Dimension Confusion

Macro Errors

Use of Constant Numbers

Arrays defined with constant size or referenced with constant index (should be macro or const variable)
for(...) // with constants

Physical Dimension Confusion

When variables which are the simple or fundamental types (e.g. int, or double) it is too easy to write a statement which has a physical dimensionality error that is invisible to the compiler. There are multiple kinds of such semantic mistakes:

qualitatively incompatible numbers: temperature += distance;
incorrect use of scale: size = inches + feet
incorrect multiplication and divide: velocity = time/distance;
dimensionality calculation: area = length * width * height;

For example:

float length=5.0, width=3.4, height=7.2;
float area, volume;
float celsius, farenheight;
float time1=116.3, time2=60.0;

height = length + celsius; // mixed distance and temperature
area = length * time2;

The solution, which enables the compielr to enforce dimensionality checking, is to use a different struct with dimension in tag or field name, with fields of simple types.

Macro Errors

( missing parenthesis around expression and each parameter )

Example:

#define calc(a,b,c) a+5*b-c
int x, y, z;

z = 7*calc(x,3+z,-y);

Results in:

z = 7*x+5*3+z--y;

Corrected macro:

#define calc(a,b,c) ((a)+5*(b)-(c))

Corrected Results:

z = 7*((x)+5*(3+z)-(-y));

use of ++ or -- operator on a variable passed in when macro used (may be used more than once)

#define min(a,b) ( ((a)<(b)) ? (a) : (b) )
int x=4, y=5, z;
z = min(--x, y++);

Results in:

z = ((--x) < (y++)) ? (--x) : (y++) )

each parameter is incremented or decremented twice - not what is wanted or expected.
Solution: never pass variables with ++ -- operators to a macro!

Memory Management

Frequently test C program with valgrind for memory management errors. Always free up allocated memory (even if program is done, so that valgrind is happy and if code is used in a larger program there are no leaks.

Names

Good variable names are not just descriptive and hopefully self explanatory. There are multiple issues beyond this which get more and more important for large bodies of code. C has a few distinct name spaces:

Types, plus Variable and function names
struct & union field tag names
struct & union field names (unique per struct or union)
labels (unique per function)

Although the compiler will not get confused by field names the same in multiple structs or unions or by function local or static function names which are the same, this is a big issues for us mere humans.

Even though a type, function or variable can be called the same if it is private (i.e. static) to a module this is a BAD practice because if an issue occurs it is helpful to grep for a unique name and not have to filter the unnecesaary. The larger issue which gives clarity and the baility to combine andresuse modules from different projects is that names in every module should have a unique (and hopefully fairly short) prefix for types, variables and functions. This also allows someone reading code to know where (i.e. in which module) that type, variable or function is defined.

Unless a type name is obvious like uint4 for unsigned int all types created by a typedef just end with suffix "_t" so it is clear to the reader that this identifier, this name, is a type.

Similarly it is essentially for every struct and union field to have a unique suffix. This not only makes it easy to use grep to find all references without extraneous noise, but it helps the programmer avoid confusion between fields. This is especially important for pointers, a common example of which is a linked list. Thus if we hat

It is best to not have anonymous tag names. Also tags must a suffix to identify union vs struct. After Combining the statements of these last paragraphs, the following is bad practice:

typedef struct foo {
   struct foo *next, *prev;
   float item;
} foo;

typdef struct {
   struct foo *next, *prev;
   int id;
   char *name;
} node;

The above must be:

typedef struct foo_s {
   struct foo_s *foo_next, *foo_prev;
   float foo_item;
} foo_t;

typdef struct node_s {
   struct foo *node_foo_next, *node_foo_prev;
   int node_id;
   char *node_name;
} node_t;

Clib Issues

getchar( ) // has state!

scanf( ) // needs real parsing

sleep( ) // may need asynchronous handshake with state machines

sprintf( ) // possible buffer overflow, consider snprintf()

strcat( ) // may need strncat()

strcpy( ) // can have buffer overflow

strncpy( )
     // has no buffer overflow, but has two defects:
     // (1) does not guarantee NULL at end!
     // (2) fills destination with NULLs (if room) - waste of time
     // therefore: write your own

Examples

Example 1:
Buffer Overflow

Example 1: Buffer Overflow


// Naive code using strcat.
// This will work provived BUF_SIZE is bigger than the sum of:
// the field_name sizes + NUM_FIELDS + 1,
// but may fail (i.e. overflow bug) if those sizes are unknown at
// compile time and their maximum size is not known.
#define NUM_FIELDS 10
#define BUF_SIZE 1024
char *field_name[NUM_FIELDS]={
    "field1", "field2", ..., "field10"
};
char buf[BUF_SIZE];
int first,x;
    buf[0] = 0;
    for(first=1,x=0; x<NUM_FIELDS; x++) {
        if(first)
            first = 0;
        else
            strcat(buf, ",");
        strcat(buf, field_name[x]);
    }

// Less naive code (actually seen in our code) using strncat, but incorrectly!
#define NUM_FIELDS 10
#define BUF_SIZE 1024
char *field_name[NUM_FIELDS]; // field name sizes unknown at compile time
char buf[BUF_SIZE];
int first,x;
    buf[0] = 0;
    for(first=1,x=0; x<NUM_FIELDS; x++) {
        if(first)
            first = 0;
        else
            strncat(buf, ",", BUF_SIZE);
        strncat(buf, field_name[x], BUF_SIZE);
    }
    // The above is wrong because the semantics for the third parameter to
    // strncat does not specify the size of the destination but the maximum
    // number of characters to copy from the source string: field_name[x].
    // This is also sub-optimal in performance because the strncat has to
    // start at the beginning of the destination string, buf, each time.

// Correct and effecient code, notice the use of 'y' (offset into buf).
#define NUM_FIELDS 10
#define BUF_SIZE 1024
char *field_name[NUM_FIELDS]; // field name sizes unknown at compile time
char buf[BUF_SIZE];
int first,x,y,len;
    buf[y=0] = 0;
    for(first=1,x=0; x<NUM_FIELDS; x++) {
        if(first)
            first = 0;
        else {
            buf[y++] = ',';
            buf[y] = 0;
        }
        len = strlen(field_name[x]); // If this done more than once then
                                     // store this length in an array,
                                     // prepared in advance.
        assert(BUF_SIZE>y+len+2); // allows for a comma after if needed
        strcat(&buf[y], field_name[x]);
        y += len;
    }

Keyword Issues

Some of the C language keywords are adjectives modifying a type.

register

There is no reason for a C programmer to use the register modifier keyword. The compiler is far more competent in making this optimization. This modifier never has meaning for a parameter being passed to a function because that parameter is on the stack anyway.

volatile

The need for the volatile keyword can only be know to the programmer. It tells the compiler it can never optimize a variable so qualified. This occurs when the variable correspond, for example, to some memory mapped IO address and hence can change at any time. This is a hardware implementation issue unknown to the compiler.

Use of Casts

Except for casts needed for pointers which are of type (void *) every cast is potentially an untruth which can be fatal. A common place where function parameters need to be (void *), or worse some other kind of pointer, is for callback functions which must have a fixed prototype but have multiple uses.

Compiler Warnings

Every compiler warning is a potential source of error and must be fixed. Ideally a compiler switch can be set to make this so. There are other compiler flags that need to be set for tight code.

Comments

Need I mention: missing, wrong, or useless comments?

Special strings to put in comments (always highlighted by vi) with typical meaning:

XXX - Asks a question about a possible something unknown to investigate.
TODO - To be done and what is known, but time is not now.
FIXME - To be done but do not yet know how.

Debugging

Debugging is the inevitable consequence of:

Our imperfection
Incomplete or ambiguous or erroneous architecture or design specification
Evolving software not thought out completely in the beginning - which occurs because our limited minds cannot foresee everything (not surprising)

This is why the agile approach to software has been recognized as a successful approach. For agile to work well it must be flexible enough to stop developing and fix accumulated bugs.

The most important thing learned the hard way from decades of experience is that if we do not fix bugs as soon as they are discovered we will implement new functionality that will break when some of the bugs are fixed because that new functionality was made to work around those bugs. The cost of delaying bug fixes is very VERY high. Much more time is lost because the bugs multiply, and they are harder to find because many interacting bugs are hard to find and fix.

The consequences are we must:

Test, re-test, re-re-test, ...
Constantly refactor to improve: interfaces, implementations, algorithms, data structures
Fix bugs (due to inadequate testing) - we are not sufficiently but must design and implement our software for test

One of the things we know is that we must have as near 100% test coverage as possible to have a stable, robust and user friendly software.

What is important is our attitude to bug fixing. It is an essential part of the satisfaction and joy of a job well done. It is puzzle solving. The frustration at having to fix bugs is the aspiration for perfection. I do not know a single programmer who is perfect and never makes a mistake. We learn by our mistakes and become better programmers. As we redesign and refactor our abilities as a programmer and designer grow qualitatively. I am still learning new things about programming which surprise me that I did not see earlier.

Without a clear sense of the big picture, the total architecture, hardware flow and software flow, oversights are going to happen, global optimizations will be missed. And we will paint ourselves into larger or smaller corner requiring redesign, refactoring, new interfaces, new modularization, and so on.

White Space

The use of TABs in source code is a bad practice because different editors and fonts will cause a variable alignment. Indenting should use a contant 4 spaces per indent.

Ideally line lengths should be less than 80 characters. This is because when printing a line will not wrap (or be trunkated). Also wrapping in a window will be well behaved.

2002-2009