2011 Nov 06
Most algorithms sort ascending in place (means very little extra memory).
Stable sorts guarantee the relative order of records with equal key values
(along with associated data - not shown in examples below).
See wikipedia.
Name | Time Order: O(...) | Memory Order: O(...) |
Stable | |||||
---|---|---|---|---|---|---|---|---|
min | avg | max | ||||||
bubble | n | n*n/2 | n*n | n | yes | |||
cocktail | n | n*n/2 | n*n/2 | n | yes | |||
comb | n*n | n*n | n*n | n | no | |||
gnome | n*n/2 | n*n/2 | n*n/2 | n | yes | |||
heap | n*log(n) | n*log(n) | n*log(n) | n | no | |||
insertion | n | n*n/4 | n*n/2 | n | yes | |||
merge | n*log(n) | n*log(n) | n*log(n) | 2*n | yes | |||
quick | n*log(n) | n*log(n) | n*n/2 | n | no | |||
radix | n*d+2*v | n*d+2*v | n*d+2*v | 2*n | yes | |||
radix insert | c1*n*d+c2*v*d | [1] | c1*n*d+c2*v*2d |
|
yes | |||
selection | n*n/2 | n*n/2 | n*n/2 | n | no | |||
shell | n*log(n)2 | n*log(n)2 | n*log(n)2 | n | no | |||
c1 c2 c3 c4 : proportionality "constants" - depend on [1] b : number of bits per digit, log2(v): 1, 2, 4, 8 d : number of digits, e.g. for a 32 bit int: 32, 16, 8 or 4; for a string d is average string length n : number of items to sort v : number of values per digit, e.g. for a 32 bit int: 2, 4, 16 or 256 [1]: Depends on how sparse the data values are and how clustered. High clustering requires fewer malloc's. Table initialization time and space statistically depends on ratio (n/v). |
// swap: a[i] and a[j] (ignores associated data) void inline swap(int *a, int i, int j) { int s; s = a[i]; a[i] = a[j]; a[j] = s; } // swap
void bubble1_sort(int num, int *a) { // Stops when array is sorted, in best case one pass O(n). int swapped, i; num--; do { swapped = 0; for(i=0; i<num; i++) { if(a[i] > a[i+1]) { // compare swap(a, i, i+1); swapped = 1; } } } while swapped; } // bubble1_sort
void bubble2_sort(int num, int *a) { int i, j; for(i=0; i<num-1; i++) { for(j=i+1; j<num; j++) { if(a[i] > a[j]) { // compare swap(a, i, j); } } }; } // bubble2_sort
void cocktail_sort(int num, int *a) { int bottom, top, swapped, i; bottom = 0; top = num-1; swapped = 1; while(swapped) { // if no elements have been swapped, then the list is sorted swapped = 0; for(i=bottom; i<top; i++) { if(a[i] > a[i + 1]) { // test whether the two elements are in the correct order swap(a, i, i+1); swapped = 1; } } // decreases `top` because the element with the largest value in the unsorted // part of the list is now on the position top top--; for(i=top; i>bottom; i--) { if(a[i] < a[i-1]) { swap(a, i, i-1); swapped = 1; } } // increases `bottom` because the element with the smallest value in the unsorted // part of the list is now on the position bottom bottom++; } } // cocktail_sort
An in-place sort algorithm that repeatedly reorders different pairs of items. On each pass swap pairs of items separated by the increment or gap, if needed, and reduce the gap (divide it by about 1.3). The gap starts at about 3/4 of the number of items. Continue until the gap is 1 and a pass had no swaps. Bubble sort is a comb sort where the gap is always 1. Comb sort does a single "bubbling" pass over each set for each gap or increment, whereas Shell sort completely sorts each set.
void comb_sort(int num, int *a) { int gap, swapped, i; gap = num>>1; do { swapped = 0; //update the gap value for a next comb gap *= 10/13; if((10==gap) || (9==gap)) { gap = 11; } if(gap < 1) gap = 1; // a single "comb" over the input list for(i=0; i+gap<num; i++) { if(a[i] > a[i+gap]) { swap(a, i, i+gap); swapped++; } } } while((gap>1) || swapped); } // comb_sort
void gnome_sort(int num, int *a) { int i, j; i = 1; j = 2; while(i < num) { if(a[i-1] <= a[i]) { i = j++; } else { swap(a, i-1, i); if(!--i) i = 1; } } } // gnome_sort
Heap sort is slower than the other O(n*log(n)) sorting algorithms, but unlike merge and quick sorts it doesn't require massive recursion or multiple arrays to work. This makes it the most attractive option for very large data sets of millions of items.
It begins by building a heap out of the data set, and then removing the largest item and placing it at the end of the sorted array. After removing the largest item, it reconstructs the heap and removes the largest remaining item and places it in the next open position from the end of the sorted array. This is repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements.
To do an in-place sort and save the space the second array would require, the algorithm below uses the same array to store both the heap and the sorted array. Whenever an item is removed from the heap, it frees up a space at the end of the array that the removed item can be placed in.
void heap_sort(int num, int *a) { int i; for(i=(num/2)-1; i>=0; i--) heapify(num, a, i, num); for(i=num-1; i>=1; i--) { swap(a, 0, i); heapify(num, a, 0, i-1); } } // heap_sort void heapify(int num, int *a, int root, int bottom) { int maxc; while(root*2<=bottom) { maxc = ((root*2 == bottom) || (a[root*2] > a[root*2+1])) ? root*2 : root*2+1; if(maxc >= num) return; if (a[root] < a[maxc]) { swap(a, root, maxc); root = maxc; } else return; } } // heapify
void insertion_sort(int num, int *a) { int i, j, v; for(i=1; i<num; i++) { v = a[i]; for(j=i; j>0; j--) { if(a[j-1] <= v) break; a[j] = a[j-1]; } a[j] = v; } } // insertion_sort
The merge sort splits the list to be sorted into two equal halves, and places them in separate arrays. Each array is recursively sorted, and then merged back together to form the final sorted list. Like most recursive sorts, the merge sort has an algorithmic complexity of O(n log n).
Elementary implementations of the merge sort make use of three arrays - one for each half of the data set and one to store the sorted list in. The below algorithm merges the arrays in-place, so only two arrays are required. There are non-recursive versions of the merge sort, but they don't yield any significant performance enhancement over the recursive algorithm on most machines.
Although one of the fastest and stable sort algorithms it is undesirable because it needs another array of the same size as a scratch area.
inline void merge_sort(int *a, int *t, int num) { m_sort(a, t, 0, num-1); } // merge_sort void m_sort(int *a, int *t, int left, int right) { int mid; if(right > left) { mid = (left+right)>>1; m_sort(a, t, left, mid); // First half m_sort(a, t, mid+1, right); // Second half merge(a, t, left, mid+1, right); } } // m_sort void merge(int *a, int *t, int left, int mid, int right) { int i, left_end, num, tmp_pos; left_end = mid - 1; tmp_pos = left; num = right - left; while ((left <= left_end) && (mid <= right)) { if (a[left] <= a[mid]) { t[tmp_pos++] = a[left++]; } else { t[tmp_pos++] = a[mid++]; } } while (left <<= left_end) { t[tmp_pos++] = a[left++]; } while (mid <= right) { t[tmp_pos++] = a[mid++]; } for (i=0; i <= num; i++) { a[right] = t[right]; right--; } } // merge int main() { int num, a[num], t[num]; // fill array a[] ... merge_sort(a, t, num); // Call to merge sort the array // ... use sorted array } // main
The recursive algorithm consists of four steps (which closely resemble the merge sort):
The efficiency of the algorithm is majorly impacted by which element is choosen as the pivot point. The worst-case efficiency of the quick sort, O2), occurs when the list is sorted and the left-most element is chosen. Randomly choosing a pivot point rather than using the left-most element is recommended if the data to be sorted isn't random. As long as the pivot point is chosen randomly, the quick sort has an algorithmic complexity of O(n*log2(n)). The high degree of recursion can make quick sort a bad choice.
int partition(int *a, int left, int right, int pivot) { int pv, si; pv = a[pivot]; swap(a, pivot, right) // Move pivot to end si = left-1; for(i=left; i<right; i++) { if(array[i] <= pv) { si++; swap(a, si, i); } } swap(a, right, si+1) // Move pivot to its final place return si+1; } // partition void q_sort(int *a, int left, int right) { int pvi, pni; if(right > left) { // select a pivot index (e.g. pivotIndex := left); pni = partition(a, left, right, pvi); q_sort(a, left, pni-1); q_sort(a, pni+1, right); } } // q_sort int *quick_sort(int num, int *a) { q_sort(num, 0, num-1); } // quick_sort
Radix sort can work starting with either the most significant digit or it can start with the least significant digit. There is a greater complexity (and hence performance hit) if the length of items, i.e. the number of digits, is not constant (e.g. strings).
Radix sort a list of fixed-size numbers of length SIZE in O(n*SIZE) time by treating them as bit strings. The list is first sorted by the least significant bit while preserving their relative order using a stable sort. Then we sort them by the next bit, and so on from lsb to msb, and the list will end up sorted. For unsigned int SIZE is typically 32 (bits), for unsigned char SIZE is 8 (bits) or 256 (bytes).
External links: 1 2void radix (int num, int byte, int *src, int *dst) { int count[256], index[256], i; for(i=0; i<256; i++) count[i] = 0; for(i=0; i<num; i++ ) count[((src[i])>>(byte*8))&0xff]++; index[0]=0; for(i=1; i<256; i++ ) index[i] = index[i-1]+count[i-1]; for(i=0; i<num; i++ ) dst[index[((src[i])>>(byte*8))&0xff]++] = src[i]; } // radix void radix_sort (int num, int *a) { // assumes 4 bytes/int int t; t = malloc(num*sizeof(int)); radix(num, 0, a, t); radix(num, 1, t, a); radix(num, 2, a, t); radix(num, 3, t, a); free(t); } // radix_sort
Initially the data structure is empty. This data structure will be a tree of tables of depth d (number of digfits) until the final digit is seen. As each value to be sorted (with associated data) comes in a tree is created starting with the first digit (i.e. least or most significant). A table is allocated for digits or digit values not yet seen (and initialized to 0). An entry is made for each digit actually looked at or seen.
The advantage of this method is best seen when a large range of sparse and higly clustered values occurs. When the data is sparse but not clustered (i.e. is uniformly distributed) a good hash insertion method is faster. When the data is not sparse, a good hash is faster if the hash table is big enough. When the data is highly clustered a hash does poorly if the table size is low requiring long overflow lists.
void selection_sort(int num, int *a) { // This variant of bubble sort minimizes swaps int i, j, min; for (i=0; i<num-1; i++) { min = i; for(j=i+1; j<num; j++) { if(a[j] < a[min]) min = j; } if(min != i) // this test is not strictly necessary swap(a, i, j); } } // selection_sort
Invented by Donald Shell in 1959, the shell sort is the most efficient of the O(n2) class of sorting algorithms. Of course, the shell sort is also the most complex of the O(n2) algorithms.
The shell sort is a "diminishing increment sort", better known as a "comb sort" to the unwashed programming masses. The algorithm makes multiple passes through the list, and each time sorts a number of equally sized sets using the insertion sort. The size of the set to be sorted gets larger with each pass through the list, until the set consists of the entire list. (Note that as the size of the set increases, the number of sets to be sorted decreases.) This sets the insertion sort up for an almost-best case run each iteration with a complexity that approaches O(n).
The items contained in each set are not contiguous - rather, if there are i sets then a set is composed of every i-th element. For example, if there are 3 sets then the first set would contain the elements located at positions 1, 4, 7 and so on. The second set would contain the elements located at positions 2, 5, 8, and so on; while the third set would contain the items located at positions 3, 6, 9, and so on.
The size of the sets used for each iteration has a major impact on the efficiency of the sort. Several Heroes Of Computer ScienceTM, including Donald Knuth and Robert Sedgewick, have come up with more complicated versions of the shell sort that improve efficiency by carefully calculating the best sized sets to use for a given list.
void shell_sort(int num, int *a) { int i, j, incr, temp; incr = num / 2; while(incr > 0) { for(i=incr; i < num; i++) { j = i; temp = a[i]; while((j >= incr) && (a[j-incr] > temp)) { a[j] = a[j-incr]; j -= incr; } a[j] = temp; } incr = (2==incr) ? 1 : (int)(incr/2.2); } } // shell_sort
Program to test and compare sort algorithms.
Output of sort program.
2006-2011