Search Algorithms

2011 Nov 06

Search Criteria

The search method that is chosen depends on multiple factors:
Time Order: O(...) Memory Order: O(...)
min avg max min avg max
associative 1k/2k nnn
binary 1log2(n)/2log2(n) nnn
linear 1n/2n nnn
hash hh+kh+n nc*nc*n
table kkk nn*logb(n)n*logb(n)
tree 1logb(n)/2logb(n) nn*logb(n)n*logb(n)
h: cost of hash calculation.

Depending on the nature of the search problem (see: Search Criteria) the above methods can be combined into a linear sequence or tree of decision/search methods.

Depending on the size of the data, i.e. number of key values, all of the data can be in memory. If the data set is very large (>1GB) then the data may have to be on disk or some similar I/O media. For speed reasons there may be a tradeoff where part of the lookup data is in memory and the rest on disk.

Associative Lookup

Associative lookup requires special hardware, e.g. a Content Addressable memory (CAM).

This hardware does parallel lookup. If the entire set of key values fits in the CAM, then lookup time is O(1). If the CAM is of size 's' which is too small then the time is proprortional to k which is proprtional to n/s.

Binary Search

This method is a tradeoff of more speed but no extra memory. At any given moment when searching the table is of a known size 'num'. See example code:

int find_bin(int num, int *a, int key) {
// Returns -1 if not found else returns index in table a[].
// Requiers that table a[] be in ascending order of key values.
int bot, mid, top;
    bot = 0;
    top = num-1;
    mid = top>>1;
    while(1) {
        if(key == a[mid])
            return mid;
        if(key < a[mid]) {
            if(bot >= top)
                return -1; // not found
            top = mid-1;
            mid = (bot+top)>>1;
        } else {
            if(bot >= top)
                return -1; // not found
            bot = mid+1;
            mid = (bot+top)>>1;
    return -1;
} // find_bin

Linear Search

This is the slowest and most naive method (assuming someone does not go out of their way to be even slower :) since it can take time O(n). If the number of key values is constant then a fixed size table can be made in memory or disk. If the number is variable then the table can be of variable, i.e. dynamic size, or be implemented as a (singly or doubly) linked list can be used (which costs more memory - due to one or two link pointers).

Table Search

When the table is small or the maximum size of the table space (e.g. less than 65Kb) and sufficient memory is available, the entire table can be implemented allowing an answer in time of O(1). If the key value space usage is dense then a full tree of the space could be implemented, all or some in memory and all or some on disk. If the ley value space is sparse then a sparse tree of tables can be used.

Tree Search

There are many kinds of trees: binary trees, trees with a maximum number of children, balanced trees, clustered trees (e.g. rtrees). For parsing, for example, a tree of states based on input characters could be implemented, where each state adds to the a parse tree created as the input is read.

If the key value is a string, with characters values limited to ASCII (numbers in the range 0..127) then a tree of tables, each of size 128, can implemented giving an effective speed of O(k*log128(n)), where k is proportional to the maximum length of the key string.