Standards and Style for Coding in ANSI C

Use the new ANSI Standard C

The C dialect we will be using is ANSI standard C, or C89. If you are not familiar with this dialect, buy the second edition of Kernighan and Ritchie's The C Programming Language and Harbison and Steele's C: A Reference Manual. Both of these books are required reading for any C programmer.

Fully utilize the major improvements to the C language made in this standard:

Header files should contain the full function prototypes of global functions. The return type and each function arguments are to be explicitly declared.
Similarly functions used only within a single modules should have the static qualifier and should either be defined or fully declared before their first use in the mdoule.
Use the new types: void's and enum's wherever possible.
Although structures can now be passed as arguments and returned from functions it is recommended to pass them as pointers.
Use the rich library that has been mandated and standardized, but beware pf certain functions to be avoided (see below).

Please try to stay within ANSI standard C and the library as much as possible. If you need to write system-dependent code, try to modularize and isolate it as much as possible (perhaps with): #if MS_WIN ... #endif or #if LINUX ... #endif

Common Mistakes in C

Avoid:

Implicit declarations of int
Ambiguous declarations/definitions
Surprising precedence of operators - use parenthesis

What follows are some controversial aspects of C which are nonetheless integral to the language.

Remember the syntax for type declarations follows use, as in:

int (*apfi[])();

C is a language with pointers; don't go into denial over this. The fact that a[i] is equivalent to *(a + i) is one of the defining characteristics of C, and should be embraced.

Comments

Comments can add immensely to the readability of a program, but used heavily or poorly placed they can render good code completely incomprehensible. Too many comments can be a problem too - at least then people can find the code! (An inaccurate or misleading comment hurts more than a good comment helps! Be sure that your comments stay right.)

That being said, good places to put comments are:

a broad overview at the beginning of a module
data structure definitions
global variable definition
at the beginning of a function
tricky steps within a function

If you do something weird, a comment to explain why can save future generations from wondering. This is a subtle tip off where to look first for bugs. Finally, avoid fancy layout or decoration.

/* single line comments look like this */

/*
 * Important single line comments look like multi-line comments.
 */

/*
 * Multiline comments look like this.  Put the opening and closing
 * comment sequences on lines by themselves.  Use complete sentences
 * with proper English grammar, capitalization, and punctuation.
 */

/* but you don't need to punctuate or capitalize one-liners */

// this new C++ style on line comment is supported by most C compilers now too

The opening / of all comments should be indented to the same level as the code to which it applies, for example:

if (fubar()) {
	/*
	 * Fouled up beyond all recognition.  Print a nastygram
	 * and attempt to clean up.  If that doesn't work,
	 * die horribly, and try to crash the system while
	 * we're at it.
	 */
	...
}

The assert() macro is an excellent 'executable comment'.

Source File Organization

Use the following organization for source files:

includes of system headers

includes of local headers

type and constant definitions

global variables

functions

A reasonable variation might be to have several repetitions of the last three sections.

Within each section, order your functions in a 'bottom up' manner - defining functions before their use. The benefit of avoiding redundant (hence error-prone) forward declarations outweighs the minor irritation of having to jump to the bottom of the file to find the main or higher level or top functions.

In header files, use the following organization:

type and constant definitions

external object declarations

external function declarations

Again, several repetitions of the above sequence might be reasonable. Every object declaration must be preceded by the keyword extern; do not instantiate variables via a header file. (See below for why.)

Beware of nested includes, because of make issues. Consider using the makedepend tools to help maintain your source file dependencies in your Makefile.

Declarations and Types

Avoid exporting names outside of individual C source files; i.e., declare as static every function and global variable that you possibly can.

When declaring a global function or variable in a header file, use an explicit extern. For functions, provide a full ANSI C prototype. For example:

extern int errno;
extern void free(void *);

Why the extern? It is OK to declare an object any number of times, but in all the source files there can be only one definition. The extern says 'This is only a declaration.' (A definition is something that actually allocates and initializes storage for the object.) Historically,

int foo;

was ambiguously treated as either a declaration or both declaration and definition depending on linker magic. However, ANSI C allows it to be an error for this to appear at file scope in more than one place in a program. Header files should never contain object definitions, only type definitions and object declarations. This is why we require extern to appear everywhere except on the real definition.

In function prototypes, beware of the use of const. Although the ANSI standard makes some unavoidable requirements in the standard library, we don't need to widen the problem any further. What we are trying to avoid here is a phenomenon known as ``const poisoning'', where the appearance of const in some prototype forces you to go through your code and add const all over the place.

Never rely on C's implicit int typing, don't say:

extern foo;

say:

extern int foo;

Similarly, don't declare a function with an implicit return type. If it returns a meaningful integer value, declare it int. If it returns no meaningful value, declare it void. (By the way, the C standard now requires you to declare main() as returning int.)

Provide typedefs for all struct and union types, and put them before the type declarations. Creating the typedef eliminates the clutter of extra struct and union keywords, and makes your structures look like first-class types in the language. Putting the typedefs before the type declarations allows them to be used when declaring circular types. It is also nice to have a list of all new reserved words up front.

typedef struct Foo Foo;
typedef struct Bar Bar;

struct Foo {
    Bar *bar;
};

struct Bar {
    Foo *foo;
};

This give a particularly nice scheme of exporting opaque objects in header files.

In header.h:

typedef struct Foo Foo;

In source.c:

#include "header.h"

struct Foo { .. };

Then a client of header.h can declare a

Foo *x;

but cannot get at the contents of a Foo. In addition, the user cannot declare a plain (non pointer) Foo, and so is forced to go through whatever allocation routines you provide. We strongly encourage this modularity technique.

If an enum is intended to be declared by the user (as opposed to just being used as names for integer values), give it a typedef too. Note that the typedef has to come after the enum declaration.

Don't mix any declarations in with type definitions; i.e., don't say:

struct foo {
	int x;
} object;

Also don't say:

typedef struct {
	int x;
} type;

(It's important for all typedefs to stand out by themselves.)

Declare each field of a structure on a line by itself. Think about the order of the fields. Try to keep related fields grouped. Within groups of related fields, pick some uniform scheme for organizing them, for example alphabetically or by frequency of use. When all other considerations are equal, place larger fields first, as C's alignment rules may then permit the compiler to save space by not introducing "holes" in the structure layout.

Use of the Preprocessor

For constants, consider using:

enum { Red = 0xF00, Blue = 0x0F0, Green = 0x00F };
static const float pi = 3.14159265358;

instead of #defines, which are rarely visible in debuggers.

Macros should avoid side effects. If possible, mention each argument exactly once (avoiding pre/post increment/decrement problems). Fully parenthesize all arguments. When the macro is an expression, parenthesize the whole macro body. If the macro is the inline expansion of some function, the name of the macro should be the same as that of the function, except fully capitalized. When continuing a macro across multiple lines with backslashes, line up the backslashes way over on the right edge of the screen to keep them from cluttering up the code.

#define OBNOXIOUS(X)					\
	(save = (X),					\
	 dosomethingwith(X),				\
	 (X) = save)

Try to write macros so that they are syntactically expressions. C's comma and conditional operators are particularly valuable for this. If you absolutely cannot write the macro as an expression, enclose the macro body in do { ... } while (0). This way the expanded macro plus a trailing semicolon becomes a syntactic statement.

If you think you need to use #ifdef, consider restricting the dependent code to a single module. For instance, if you need to have different code for Unix and MS_DOS, instead of having #ifdef unix and #ifdef dos everywhere, try to have files unix.c and dos.c with identical interfaces. If you can't avoid them, make sure to document the end of the conditional code:

#ifdef FUBAR
	some code
#else
	other code
#endif /* FUBAR */

Some sanctioned uses of the preprocessor are:

Commenting out code: Use #if 0.
Using GNU C extensions: Surround with #ifdef __GNUC__.
Testing numerical limits: Feel free to conditionalize on the constants in the standard headers <float.h> and <limits.h>.

If you use an #if to test whether some condition holds that you know how to handle, but are too lazy to provide code for the alternative, protect it with #error, like this:

#include <limits.h>

#if INT_MAX > UCHAR_MAX
enum { Foo = UCHAR_MAX + 1, Bar, Baz, Barf };
#else
#error "need int wider than char"
#endif

(This example also illustrates a reasonable use of <limits.h>.)

Naming Conventions

Names should be meaningful in the application domain, not the implementation domain. This makes your code clearer to a reader who is familiar with the problem you're trying to solve, but is not familiar with your particular way of solving it. Also, the implementation may need to change some day. Note that well-structured code is layered internally, so your implementation domain is also the application domain for lower levels.

Names should be chosen to make sense when your program is read. Thus, all names should be parts of speech which will make sense when used with the language's syntactic keywords. Variables should be noun clauses. Boolean variables should be named for the meaning of their "true" value. Procedures (functions called for their side-effects) should be named for what they do, not how they do it. Function names should reflect what they return, and boolean-valued functions of an object should be named for the property their true value implies about the object. Functions are used in expressions, often in things like if's, so they need to read appropriately. For instance,

if (checksize(s))

is unhelpful because we can't deduce whether checksize returns true on error or non-error; instead

if (validsize(s))

makes the point clear and makes a future mistake in using the routine less likely.

Longer names contain more information than short names, but extract a price in readability. Compare the following examples:

for (elementindex = 0; elementindex < DIMENSION; ++elementindex)
	printf("%d\n", element[elementindex]);

for (i = 0; i < DIMENSION; ++i)
	printf("%d\n", element[i]);

In the first example, you have to read more text before you can recognize the for-loop idiom, and then you have to do still more hard work to parse the loop body. Since clarity is our goal, a name should contain only the information that it has to.

Carrying information in a name is unnecessary if the declaration and use of that name is constrained within a small scope. Local variables are usually being used to hold intermediate values or control information for some computation, and as such have little importance in themselves. For example, for array indices names like i, j, and k are not just acceptable, they are desirable.

Similarly, a global variable named x would be just as inappropriate as a local variable named elementindex. By definition, a global variable is used in more than one function or module (otherwise it would be static or local), so all of it's uses will not be visible at once. The name has to explain the use of the variable on its own. Nevertheless there is still a readability penalty for long names: casefold is better than case_fold_flag_set_by_main.

As a rule make variable name size proportional to scope:

	length(name(variable)) ~ log(countlines(scope(variable)))

Use some consistent scheme for naming related variables. If the top of memory is called physlim, should the bottom be membase? Consider the suffix -max to denote an inclusive limit, and -lim to denote an exclusive limit.

Don't take this too far, though. Avoid ``Hungarian''-style naming conventions which encode type information in variable names. They may be systematic, but they'll screw you if you ever need to change the type of a variable. If the variable has a small scope, the type will be visible in the declaration, so the annotation is useless clutter. If the variable has a large scope, the code should be modular against a change in the variable's type. In general, any deterministic algorithm for producing variable names will have the same effect.

There are weaknesses in C for large-scale programming - there is only a single, flat name scope level greater than the module level. Therefore, libraries whose implementations have more than one module can't guard their inter-module linkage from conflicting with any other global identifiers. The best solution to this problem is to give each library a short prefix that it prepends to all global identifiers.

Abbreviations or acronyms can shorten things up, but may not offer compelling savings over short full words. When a name has to consist of several words (and it often doesn't), separate words by underscores, not by BiCapitalization. It will look better to English-readers (the underscore is the space-which-is-not-a-space). Capitalization is reserved for distinguishing syntactic namespaces.

C has a variety of separately maintained namespaces, and distinguishing the names by capitalization improves the odds of C's namespaces and scoping protecting you from collisions while allowing you to use the same word across different spaces. C provides separate namespaces for:

Preprocessor Symbols

Since macros can be dangerous, follow tradition fully capitalize them, otherwise following the conventions for function or variable names.

	#define NUSERTASKS 8
	#define ISNORMAL(S) ((S)->state == Normal)

Any fully capitalized names can be regarded as fair game for #ifdef, although perhaps not for #if.

Labels

Limited to function scope, so give it a short name, lowercase. Give meaningful name such that the corresponding goto statement can be read aloud, and name it for why you go there, not what you do when you get there. For instance,

	goto bounds_error;

is more helpful than

	goto restore_pointer;

Structure, Union, or Enumeration Tags

Having these as separate namespaces creates an artificial distinction between structure, union, and enumeration types and ordinary scalar types. i.e. you can't simplify a struct type to a scalar type by replacing

	struct Foo { long bar; };

with

	typedef long Foo;

since you still have the "struct" keyword everywhere, even when the contents are not being examined. The useless "struct" keywords also clutter up the code. Therefore we advocate creating a typedef mirror of all struct tags:

	typedef struct Foo Foo;

Capitalize the tag name to match the typedef name.

Structure or Union Members

Each structure or union has a separate name space for its members, so there is no need to add a distinguishing prefix. When used in expressions they will follow a variable name, so make them lowercase to make the code look nice. If the type of a member is an ADT, the name of the type is often a good choice for the name of the variable (but in lowercase). You do not prefix the member names, as in:

		struct timeval { unsigned long tv_sec; long tv_usec; };

for they are already in a unique namespace.

Ordinary Identifiers

all other ordinary identifiers (declared in ordinary declarators, or as enumerations constants).

Typedef Names

Capitalized, with no _t suffix or other cutesy thing to say ``I'm a type'' - we can see that from it's position in the declaration! (Besides, all names ending with _t are reserved by Posix.) The capitalization is needed to distinguish type names from variable names - often both want to use the same application-level word.

Enumeration Constants

Capitalize. If absolutely necessary, consider a prefix.

		enum Fruit { Apples, Oranges, Kumquats };

Function Names

Lowercase. If they are static (and most should be), make the name short and sweet. If they are externally-visibly, try to give then a prefix unique to the module or library.

Function Parameters

Since they will be used as variables in the function body, use the conventions for variables.

Variables

Lowercase.

Lastly, develop some standard idioms to make names automatic. For instance:

int i, j, k;	/* generic indices */
char *s, *t;	/* string pointers */
char *buf;	/* character array */
double x, y, z;	/* generic floating-point */
size_t n, m, size;	/* results of sizeof or arguments to malloc */
Foo foo, *pfoo, **ppfoo;	/* sometimes a little hint helps */

Indentation and Layout

Avoid putting opening curly braces on a line by itself, as this reduces the amount of code visible in a small edit window. Avoid unnecessary curly braces, but if one branch of an if is braced, then the other should be too, even if it is only a single line. If an inner nested block is braced, then the outer blocks should be too.

Some examples:

if (foo == 7) {
	bar();
} else if (foo == 9) {
	barf();
	bletch();
} else {
	boondoggle();
	frobnicate();
}

do {
	for (i = 0; i < n; ++i)
		a[i] = 0;
	plugh();
	xyzzy();
} while (!blurf());

In switch statements, be sure every case ends with either a break, continue, return, or /* fall through */ comment. Especially don't forget to put a break on the last case of a switch statement. If you do, I promise someone will forget to add one someday when adding new cases. Indent the case with respect to the switch.

switch (phase) {
    case New:
	  printf("don't do any coding tonight\n");
	  break;
    case Full:
	  printf("beware lycanthropes\n");
	  break;
    case Waxing:
    case Waning:
	  printf("the heavens are neutral\n");
	  break;
    default:
	  /*
	   * Include occasional sanity checks in your code.
	   */
	  fprintf(stderr, "and here you thought this couldn't happen!\n");
	  abort();
}

Use goto sparingly. Two harmless places to use it are to break out of a multilevel loop, or to jump to common function exit code. Often these are the same places. There are 6 total conditions in which goto is unavoidable.

Do not use old-style function definitions where the args are declared outside the parameter list. Consider including a blank line between the local variable declarations and the code. Also feel free to include other blank lines, particularly to separate major blocks of code.

Avoid declarations in all but the most complex inner blocks (they cost time). Avoid initializations of automatic variable in declarations, since they can be mildly disconcerting when stepping through code with a debugger. Don't declare external objects inside functions, declare them at file scope. Finally, don't try to go into denial over C's 'declaration by example' syntax. Say:

char *p;

not:

char* p;

Don't parenthesize things unnecessarily; say

return 7;

not

return (7);

and especially not

return(7);

Remember, return is the exact antonym of function call! The parsing precedence of the bitwise operations (&, |, ^, ~) can be surprising. See Ritchie's explanation for the reasons why. Always use full parentheses around these operators.

A C programmer should be able to recognize its idioms and be able to parse code like:

while (*s++ = *t++)
	;

If an expression gets too long to fit in a line, break it next to a binary operator. Put the operator at the beginning of the next line (appropriately indented) to emphasize that it is continued from the previous line. This strategy leads to particularly nice results when breaking up complicated conditional expressions:

if (x == 2 || x == 3 || x == 5 || x == 7
    || x == 11 || x == 13 || x == 17 || x == 19)
	printf("x is a small prime\n");

This example also illustrates why you shouldn't add additional indenting when continuing a line - in this case, it could get confused with the condition body. Avoid breakpoints that will give the reader false notions about operator precedence, like this:

if (x == 2 || x > 10
    && x < 12 || x == 19)

If you're breaking an expression across more than two lines, try to use the same kind of breakpoint for each line. Finally, if you're getting into really long expressions, your code is probably in need of a rewrite.

Avoid sloppiness. Decide what your style is and follow it precisely. I often see code like this:

struct foo       {
	int baz ;
	int  barf;
	char * x, *y;
};

All those random extra spaces make me wonder if the programmer was even paying attention!

Expressions and Statements

In C, assignments are expressions, not statements. This allows multiple assignment

a = b = c = 1;

and assignment within expressions

if (!(bp = malloc(sizeof (Buffer)))) {
	perror("malloc");
	abort();
}

This capability can sometimes allow concise code, but at other times it can obscure important procedure calls and updates to variables. Use good judgement.

The C language lacks a true boolean type, therefore its logic operations (! == > < >= <=) and tests (in the conditional operator ?: and the if, while, do, and for statements) have some interesting semantics. Every boolean test is an implicit comparison against zero (0). However, zero is not a simple concept. It represents:

the integer zero, 0, for all integral types
the floating point zero, 0.0, (positive or negative)
the nul character
the null pointer

Consider making your intentions clear by explicitly showing the comparison with zero for all scalars, floating-point numbers and characters. This gives us the tests

(i == 0)  (x != 0.0)  (c != '\0')

instead of

(!i)  (x)  (c)

An exception is made for pointers, since 0 is the only language-level representation for the null pointer. (The symbol NULL is not part of the core language - you have to include a special header file to get it defined.) In short, pretend that C has an actual boolean type which is returned by the logical operators and expected by the test constructs, and pretend that the null pointer is a synonym for false.

Never return from the function main(), explicitly use exit(). They are no longer equivalent - there is an important distinction when using the atexit() feature with objects declared locally to main(). Don't worry about the details, just use this fact to program consistently. This does spoil the potential for calling main() recursively, which is usually a silly thing to do.

The Standard Library

The standard library is your friend. There's no excuse for writing code which already exists there. Not only will the standard library's code be tested, often it will be more efficient, and will certainly be more familiar to your fellow programmers. The best book on the subject is Plaugher's The Standard C Library. Some notes on using particular functions:

gets: Never use this. Use fgets instead so that you can be sure that you don't overflow your buffer.
malloc: Don't cast the return value of calls to malloc, It has type void *, so it will be compatible with anything. K&R2, p. 142 gives contrary advice, but it has since been retracted by Dennis Ritchie:
In any case, now that I reread the stuff on p. 142, I think it's wrong; it's written in such a way that it's not just defensive against earlier rules, it misrepresents the ANSI rules.
(From the newsgroup comp.std.c on August 15, 1995.)

Beware of library functions that have state and are called multiple times, especially in a loop. These functions include but are not limited to: strtok() getchar()