2011 Nov 06
Search Method |
Time Order: O(...) | Memory Order: O(...) | ||||||
---|---|---|---|---|---|---|---|---|
min | avg | max | min | avg | max | |||
associative | 1 | k/2 | k | n | n | n | ||
binary | 1 | log2(n)/2 | log2(n) | n | n | n | ||
linear | 1 | n/2 | n | n | n | n | ||
hash | h | h+k | h+n | n | c*n | c*n | ||
table | k | k | k | n | n*logb(n) | n*logb(n) | ||
tree | 1 | logb(n)/2 | logb(n) | n | n*logb(n) | n*logb(n) |
Depending on the nature of the search problem (see: Search Criteria) the above methods can be combined into a linear sequence or tree of decision/search methods.
Depending on the size of the data, i.e. number of key values, all of the data can be in memory. If the data set is very large (>1GB) then the data may have to be on disk or some similar I/O media. For speed reasons there may be a tradeoff where part of the lookup data is in memory and the rest on disk.
Associative lookup requires special hardware, e.g. a Content Addressable memory (CAM).
This hardware does parallel lookup. If the entire set of key values fits in the CAM, then lookup time is O(1). If the CAM is of size 's' which is too small then the time is proprortional to k which is proprtional to n/s.
This method is a tradeoff of more speed but no extra memory. At any given moment when searching the table is of a known size 'num'. See example code:
int find_bin(int num, int *a, int key) { // Returns -1 if not found else returns index in table a[]. // Requiers that table a[] be in ascending order of key values. int bot, mid, top; bot = 0; top = num-1; mid = top>>1; while(1) { if(key == a[mid]) return mid; if(key < a[mid]) { if(bot >= top) return -1; // not found top = mid-1; mid = (bot+top)>>1; } else { if(bot >= top) return -1; // not found bot = mid+1; mid = (bot+top)>>1; } } return -1; } // find_bin
This is the slowest and most naive method (assuming someone does not go out of their way to be even slower :) since it can take time O(n). If the number of key values is constant then a fixed size table can be made in memory or disk. If the number is variable then the table can be of variable, i.e. dynamic size, or be implemented as a (singly or doubly) linked list can be used (which costs more memory - due to one or two link pointers).
When the table is small or the maximum size of the table space (e.g. less than 65Kb) and sufficient memory is available, the entire table can be implemented allowing an answer in time of O(1). If the key value space usage is dense then a full tree of the space could be implemented, all or some in memory and all or some on disk. If the ley value space is sparse then a sparse tree of tables can be used.
There are many kinds of trees: binary trees, trees with a maximum number of children, balanced trees, clustered trees (e.g. rtrees). For parsing, for example, a tree of states based on input characters could be implemented, where each state adds to the a parse tree created as the input is read.
If the key value is a string, with characters values limited to ASCII (numbers in the range 0..127) then a tree of tables, each of size 128, can implemented giving an effective speed of O(k*log128(n)), where k is proportional to the maximum length of the key string.
2007-2011