2011 Nov 06

- Can there be
**at most one**or**more than one**hit for a key value?

If more than one is it constant or variable and is there an upper bound? **How fast**does the lookup have to be?**How often**does or can a search happen?- What is the
**dynamic range**of the search key values? - What are the
**maximum number**of possible search key values? - What is the
**statistics and density**of possible or actual search key values in the search key value space? - How much is speed worth (i.e.
**cost in money and time**) to spend on: (RAM or disk) - e.g. for lookup tables?*Extra memory*(e.g. CAM)?*Special hardware*- Is there a
**constant**(static)**or variable**(dynamic) number of key values? - If the number is variable is there an actual or artificial
**constraint on the maximum**number of key values? - If the interval between changes in the number of key values is very long, then it may be possible to recompile or regenerate the tables or other data structures.
- Is the search code and the key data changes or does the code could change too? If only the data changes then there may be a way to trigger the code to reload the new data from a file or communications channel?

Search Method |
Time Order: O(...) | Memory Order: O(...) | ||||||
---|---|---|---|---|---|---|---|---|

min | avg | max | min | avg | max | |||

associative | 1 | k/2 | k | n | n | n | ||

binary | 1 | log_{2}(n)/2 | log_{2}(n) |
n | n | n | ||

linear | 1 | n/2 | n | n | n | n | ||

hash | h | h+k | h+n |
n | c*n | c*n | ||

table | k | k | k | n | n*log_{b}(n) | n*log_{b}(n) |
||

tree | 1 | log_{b}(n)/2 | log_{b}(n) |
n | n*log_{b}(n) | n*log_{b}(n) |

Depending on the nature of the search problem (see: Search Criteria) the above methods can be combined into a linear sequence or tree of decision/search methods.

Depending on the size of the data, i.e. number of key values, all of the data can be in memory. If the data set is very large (>1GB) then the data may have to be on disk or some similar I/O media. For speed reasons there may be a tradeoff where part of the lookup data is in memory and the rest on disk.

Associative lookup requires special hardware, e.g. a Content Addressable memory (CAM).

This hardware does parallel lookup. If the entire set of key values fits in the CAM, then lookup time is O(1). If the CAM is of size 's' which is too small then the time is proprortional to k which is proprtional to n/s.

This method is a tradeoff of more speed but no extra memory. At any given moment when searching the table is of a known size 'num'. See example code:

int find_bin(int num, int *a, int key) { // Returns -1 if not found else returns index in table a[]. // Requiers that table a[] be in ascending order of key values. int bot, mid, top; bot = 0; top = num-1; mid = top>>1; while(1) { if(key == a[mid]) return mid; if(key < a[mid]) { if(bot >= top) return -1; // not found top = mid-1; mid = (bot+top)>>1; } else { if(bot >= top) return -1; // not found bot = mid+1; mid = (bot+top)>>1; } } return -1; } // find_bin

This is the slowest and most naive method (assuming someone does not go out of their way to be even slower :) since it can take time O(n). If the number of key values is constant then a fixed size table can be made in memory or disk. If the number is variable then the table can be of variable, i.e. dynamic size, or be implemented as a (singly or doubly) linked list can be used (which costs more memory - due to one or two link pointers).

When the table is small or the maximum size of the table space (e.g. less than 65Kb) and sufficient memory is available, the entire table can be implemented allowing an answer in time of O(1). If the key value space usage is dense then a full tree of the space could be implemented, all or some in memory and all or some on disk. If the ley value space is sparse then a sparse tree of tables can be used.

There are many kinds of trees: binary trees, trees with a maximum number of children,
balanced trees, clustered trees (e.g. **rtrees**). For parsing, for example, a tree of
states based on input characters could be implemented, where each state adds to the a parse
tree created as the input is read.

If the key value is a string, with characters values limited to ASCII (numbers in the
range 0..127) then a tree of tables, each of size 128, can implemented giving an effective
speed of O(k*log_{128}(n)), where **k** is proportional to the maximum
length of the key string.

2007-2011