Fully utilize the major improvements to the C language made in this standard:
static
qualifier and should either be defined or fully
declared before their first use in the mdoule.
void
's and enum
's
wherever possible.
Avoid:
int
Remember the syntax for type declarations follows use, as in:
int (*apfi[])();
C is a language with pointers; don't go into denial over this.
The fact that a[i]
is equivalent to *(a + i)
is one of the defining characteristics of C, and should be embraced.
That being said, good places to put comments are:
/* single line comments look like this */ /* * Important single line comments look like multi-line comments. */ /* * Multiline comments look like this. Put the opening and closing * comment sequences on lines by themselves. Use complete sentences * with proper English grammar, capitalization, and punctuation. */ /* but you don't need to punctuate or capitalize one-liners */ // this new C++ style on line comment is supported by most C compilers now tooThe opening
/
of all comments should be indented to the same
level as the code to which it applies, for example:
if (fubar()) { /* * Fouled up beyond all recognition. Print a nastygram * and attempt to clean up. If that doesn't work, * die horribly, and try to crash the system while * we're at it. */ ... }The
assert()
macro is an excellent 'executable comment'.
Within each section, order your functions in a 'bottom up' manner - defining functions before their use. The benefit of avoiding redundant (hence error-prone) forward declarations outweighs the minor irritation of having to jump to the bottom of the file to find the main or higher level or top functions.
In header files, use the following organization:
extern
; do not instantiate variables via a header file.
(See below for why.)
Beware of nested includes, because of make issues.
Consider using the makedepend
tools to help maintain
your source file dependencies in your Makefile
.
static
every function and global variable
that you possibly can.
When declaring a global function or variable in a header file, use an
explicit extern
.
For functions, provide a full ANSI C prototype.
For example:
extern int errno; extern void free(void *);
Why the extern
?
It is OK to declare an object any number of times,
but in all the source files there can be only one definition.
The extern
says 'This is only a declaration.'
(A definition is something that actually allocates and initializes
storage for the object.)
Historically,
int foo;was ambiguously treated as either a declaration or both declaration and definition depending on linker magic. However, ANSI C allows it to be an error for this to appear at file scope in more than one place in a program. Header files should never contain object definitions, only type definitions and object declarations. This is why we require
extern
to appear everywhere
except on the real definition.
In function prototypes, beware of the use of const
. Although the
ANSI standard makes some unavoidable requirements in the standard library,
we don't need to widen the problem any further. What we are trying to
avoid here is a phenomenon known as ``const
poisoning'',
where the appearance of const
in some prototype forces
you to go through your code and add const
all over the place.
Never rely on C's implicit int
typing, don't say:
extern foo;say:
extern int foo;Similarly, don't declare a function with an implicit return type. If it returns a meaningful integer value, declare it
int
. If
it returns no meaningful value, declare it void
.
(By the way, the C standard now requires you to declare main()
as returning int
.)
Provide typedefs for all struct
and union
types,
and put them before the type declarations.
Creating the typedef eliminates the clutter of extra struct
and union
keywords,
and makes your structures look like first-class types in the language.
Putting the typedefs before the type declarations allows them to
be used when declaring circular types.
It is also nice to have a list of all new reserved words up front.
typedef struct Foo Foo; typedef struct Bar Bar; struct Foo { Bar *bar; }; struct Bar { Foo *foo; };
This give a particularly nice scheme of exporting opaque objects in header files.
In header.h
:
typedef struct Foo Foo;In
source.c
:
#include "header.h" struct Foo { .. };Then a client of header.h can declare a
Foo *x;but cannot get at the contents of a
Foo
.
In addition, the user cannot declare a plain (non pointer) Foo
,
and so is forced to go through whatever allocation routines you provide.
We strongly encourage this modularity technique.
If an enum
is intended to be declared by the user
(as opposed to just being used as names for integer values),
give it a typedef too.
Note that the typedef has to come after the
enum
declaration.
Don't mix any declarations in with type definitions; i.e., don't say:
struct foo { int x; } object;Also don't say:
typedef struct { int x; } type;(It's important for all typedefs to stand out by themselves.)
Declare each field of a structure on a line by itself. Think about the order of the fields. Try to keep related fields grouped. Within groups of related fields, pick some uniform scheme for organizing them, for example alphabetically or by frequency of use. When all other considerations are equal, place larger fields first, as C's alignment rules may then permit the compiler to save space by not introducing "holes" in the structure layout.
enum { Red = 0xF00, Blue = 0x0F0, Green = 0x00F }; static const float pi = 3.14159265358;instead of
#defines
, which are rarely visible in debuggers.
Macros should avoid side effects. If possible, mention each argument exactly once (avoiding pre/post increment/decrement problems). Fully parenthesize all arguments. When the macro is an expression, parenthesize the whole macro body. If the macro is the inline expansion of some function, the name of the macro should be the same as that of the function, except fully capitalized. When continuing a macro across multiple lines with backslashes, line up the backslashes way over on the right edge of the screen to keep them from cluttering up the code.
#define OBNOXIOUS(X) \ (save = (X), \ dosomethingwith(X), \ (X) = save)Try to write macros so that they are syntactically expressions. C's comma and conditional operators are particularly valuable for this. If you absolutely cannot write the macro as an expression, enclose the macro body in
do { ... } while (0)
.
This way the expanded macro plus a trailing semicolon
becomes a syntactic statement.
If you think you need to use #ifdef
, consider restricting
the dependent code to a single module.
For instance, if you need to have different code for Unix and
MS_DOS, instead of having #ifdef unix
and #ifdef dos
everywhere, try to have files unix.c
and dos.c
with identical interfaces.
If you can't avoid them, make sure to document the end of the
conditional code:
#ifdef FUBAR some code #else other code #endif /* FUBAR */Some sanctioned uses of the preprocessor are:
#if 0
.
#ifdef __GNUC__
.
<float.h>
and <limits.h>
.
#if
to test whether some condition holds that you
know how to handle, but are too lazy to provide code for the
alternative, protect it with #error
, like this:
#include <limits.h> #if INT_MAX > UCHAR_MAX enum { Foo = UCHAR_MAX + 1, Bar, Baz, Barf }; #else #error "need int wider than char" #endif(This example also illustrates a reasonable use of <limits.h>.)
Names should be meaningful in the application domain, not the implementation domain. This makes your code clearer to a reader who is familiar with the problem you're trying to solve, but is not familiar with your particular way of solving it. Also, the implementation may need to change some day. Note that well-structured code is layered internally, so your implementation domain is also the application domain for lower levels.
Names should be chosen to make sense when your program is read. Thus, all names should be parts of speech which will make sense when used with the language's syntactic keywords. Variables should be noun clauses. Boolean variables should be named for the meaning of their "true" value. Procedures (functions called for their side-effects) should be named for what they do, not how they do it. Function names should reflect what they return, and boolean-valued functions of an object should be named for the property their true value implies about the object. Functions are used in expressions, often in things like if's, so they need to read appropriately. For instance,
if (checksize(s))is unhelpful because we can't deduce whether
checksize
returns true on error or non-error; instead
if (validsize(s))makes the point clear and makes a future mistake in using the routine less likely.
Longer names contain more information than short names, but extract a price in readability. Compare the following examples:
for (elementindex = 0; elementindex < DIMENSION; ++elementindex) printf("%d\n", element[elementindex]); for (i = 0; i < DIMENSION; ++i) printf("%d\n", element[i]);In the first example, you have to read more text before you can recognize the for-loop idiom, and then you have to do still more hard work to parse the loop body. Since clarity is our goal, a name should contain only the information that it has to.
Carrying information in a name is unnecessary if the declaration and
use of that name is constrained within a small scope.
Local variables are
usually being used to hold intermediate values or control information
for some computation, and as such have little importance in themselves.
For example, for array indices names like i
, j
,
and k
are not just acceptable, they are desirable.
Similarly, a global variable named x
would be just as inappropriate
as a local variable named elementindex
.
By definition, a global variable is used in more than one function or
module (otherwise it would be static or local),
so all of it's uses will not be visible at once.
The name has to explain the use of the variable on its own.
Nevertheless there is still a readability penalty for long names:
casefold
is better than
case_fold_flag_set_by_main
.
As a rule make variable name size proportional to scope:
length(name(variable)) ~ log(countlines(scope(variable)))
Use some consistent scheme for naming related variables.
If the top of memory is called physlim
,
should the bottom be membase
?
Consider the suffix -max
to denote an inclusive limit,
and -lim
to denote an exclusive limit.
Don't take this too far, though. Avoid ``Hungarian''-style naming conventions which encode type information in variable names. They may be systematic, but they'll screw you if you ever need to change the type of a variable. If the variable has a small scope, the type will be visible in the declaration, so the annotation is useless clutter. If the variable has a large scope, the code should be modular against a change in the variable's type. In general, any deterministic algorithm for producing variable names will have the same effect.
There are weaknesses in C for large-scale programming - there is only a single, flat name scope level greater than the module level. Therefore, libraries whose implementations have more than one module can't guard their inter-module linkage from conflicting with any other global identifiers. The best solution to this problem is to give each library a short prefix that it prepends to all global identifiers.
Abbreviations or acronyms can shorten things up, but may not offer compelling savings over short full words. When a name has to consist of several words (and it often doesn't), separate words by underscores, not by BiCapitalization. It will look better to English-readers (the underscore is the space-which-is-not-a-space). Capitalization is reserved for distinguishing syntactic namespaces.
C has a variety of separately maintained namespaces, and distinguishing the names by capitalization improves the odds of C's namespaces and scoping protecting you from collisions while allowing you to use the same word across different spaces. C provides separate namespaces for:
#define NUSERTASKS 8 #define ISNORMAL(S) ((S)->state == Normal)Any fully capitalized names can be regarded as fair game for
#ifdef
, although perhaps not for #if
.
goto
statement can be read aloud, and name it for why
you go there, not what you do when you get there. For instance,
goto bounds_error;is more helpful than
goto restore_pointer;
struct Foo { long bar; };with
typedef long Foo;since you still have the "struct" keyword everywhere, even when the contents are not being examined. The useless "struct" keywords also clutter up the code. Therefore we advocate creating a typedef mirror of all struct tags:
typedef struct Foo Foo;Capitalize the tag name to match the typedef name.
struct timeval { unsigned long tv_sec; long tv_usec; };for they are already in a unique namespace.
_t
suffix or other cutesy
thing to say ``I'm a type'' - we can see that from it's position
in the declaration!
(Besides, all names ending with _t
are reserved by Posix.)
The capitalization is needed to distinguish type names from
variable names - often both want to use the same application-level
word.
enum Fruit { Apples, Oranges, Kumquats };
static
(and most should be),
make the name short and sweet. If they are externally-visibly, try to
give then a prefix unique to the module or library.
Lastly, develop some standard idioms to make names automatic. For instance:
int i, j, k; /* generic indices */ char *s, *t; /* string pointers */ char *buf; /* character array */ double x, y, z; /* generic floating-point */ size_t n, m, size; /* results of sizeof or arguments to malloc */ Foo foo, *pfoo, **ppfoo; /* sometimes a little hint helps */
Avoid putting opening curly braces on a line by itself, as this reduces
the amount of code visible in a small edit window.
Avoid unnecessary curly braces, but if one branch of an if
is braced, then the other should be too, even if it is only a
single line. If an inner nested block is braced, then the outer blocks
should be too.
Some examples:
if (foo == 7) { bar(); } else if (foo == 9) { barf(); bletch(); } else { boondoggle(); frobnicate(); } do { for (i = 0; i < n; ++i) a[i] = 0; plugh(); xyzzy(); } while (!blurf());
In switch
statements, be sure every case ends with either a
break
, continue
, return
, or
/* fall through */
comment. Especially don't forget to put a break
on the last
case of a switch
statement. If you do, I promise someone will
forget to add one someday when adding new cases. Indent the case
with respect to the switch
.
switch (phase) { case New: printf("don't do any coding tonight\n"); break; case Full: printf("beware lycanthropes\n"); break; case Waxing: case Waning: printf("the heavens are neutral\n"); break; default: /* * Include occasional sanity checks in your code. */ fprintf(stderr, "and here you thought this couldn't happen!\n"); abort(); }
Use goto
sparingly. Two harmless places to use it are to break
out of a multilevel loop, or to jump to common function exit code.
Often these are the same places. There are 6 total conditions in which
goto
is unavoidable.
Do not use old-style function definitions where the args are declared outside the parameter list. Consider including a blank line between the local variable declarations and the code. Also feel free to include other blank lines, particularly to separate major blocks of code.
Avoid declarations in all but the most complex inner blocks (they cost time). Avoid initializations of automatic variable in declarations, since they can be mildly disconcerting when stepping through code with a debugger. Don't declare external objects inside functions, declare them at file scope. Finally, don't try to go into denial over C's 'declaration by example' syntax. Say:
char *p;not:
char* p;
Don't parenthesize things unnecessarily; say
return 7;not
return (7);and especially not
return(7);Remember, return is the exact antonym of function call! The parsing precedence of the bitwise operations (
&
, |
, ^
, ~
)
can be surprising.
See Ritchie's
explanation for the reasons why.
Always use full parentheses around these operators.
A C programmer should be able to recognize its idioms and be able to parse code like:
while (*s++ = *t++) ;
If an expression gets too long to fit in a line, break it next to a binary operator. Put the operator at the beginning of the next line (appropriately indented) to emphasize that it is continued from the previous line. This strategy leads to particularly nice results when breaking up complicated conditional expressions:
if (x == 2 || x == 3 || x == 5 || x == 7 || x == 11 || x == 13 || x == 17 || x == 19) printf("x is a small prime\n");This example also illustrates why you shouldn't add additional indenting when continuing a line - in this case, it could get confused with the condition body. Avoid breakpoints that will give the reader false notions about operator precedence, like this:
if (x == 2 || x > 10 && x < 12 || x == 19)If you're breaking an expression across more than two lines, try to use the same kind of breakpoint for each line. Finally, if you're getting into really long expressions, your code is probably in need of a rewrite.
Avoid sloppiness. Decide what your style is and follow it precisely. I often see code like this:
struct foo { int baz ; int barf; char * x, *y; };All those random extra spaces make me wonder if the programmer was even paying attention!
a = b = c = 1;and assignment within expressions
if (!(bp = malloc(sizeof (Buffer)))) { perror("malloc"); abort(); }This capability can sometimes allow concise code, but at other times it can obscure important procedure calls and updates to variables. Use good judgement.
The C language lacks a true boolean type,
therefore its logic operations (! == > < >= <=
)
and tests (in the conditional operator ?:
and the if
, while
, do
,
and for
statements) have some interesting semantics.
Every boolean test is an implicit comparison against zero (0
).
However, zero is not a simple concept.
It represents:
(i == 0) (x != 0.0) (c != '\0')instead of
(!i) (x) (c)An exception is made for pointers, since
0
is the only
language-level representation for the null pointer.
(The symbol NULL
is not part of the core
language - you have to include a special header file to get it defined.)
In short, pretend that C has an actual boolean type which is returned
by the logical operators and expected by the test constructs,
and pretend that the null pointer is a synonym for false.
Never return from the function main()
,
explicitly use exit()
.
They are no longer equivalent - there is an important distinction
when using the atexit()
feature with objects declared
locally to main()
.
Don't worry about the details, just use this fact to program consistently.
This does spoil the potential for calling main()
recursively, which is usually a silly thing to do.
fgets
instead so that
you can be sure that you don't overflow your buffer.
malloc
,
It has type void *
, so it will be compatible with
anything.
K&R2, p. 142 gives contrary advice, but it has since been
retracted by Dennis Ritchie:
In any case, now that I reread the stuff on p. 142, I think it's wrong; it's written in such a way that it's not just defensive against earlier rules, it misrepresents the ANSI rules.(From the newsgroup
comp.std.c
on August 15, 1995.)
strtok() getchar()