# Sort Algorithms

2011 Nov 06

Most algorithms sort ascending in place (means very little extra memory).
Stable sorts guarantee the relative order of records with equal key values
(along with associated data - not shown in examples below). See wikipedia.

Name Time Order: O(...) Memory Order: O(...)

Stable
min avg max
bubble nn*n/2n*n n yes
cocktail n n*n/2 n*n/2 n yes
comb n*n n*n n*n n no
gnome n*n/2 n*n/2 n*n/2 n yes
heap n*log(n) n*log(n) n*log(n) n no
insertion n n*n/4 n*n/2 n yes
merge n*log(n) n*log(n) n*log(n) 2*n yes
quick n*log(n) n*log(n) n*n/2 n no
radix n*d+2*v n*d+2*v n*d+2*v 2*n yes
 c3*n+c4*v*d  c3*n+c4*v*2d
yes
selection n*n/2n*n/2n*n/2 n no
shell n*log(n)2 n*log(n)2 n*log(n)2 n no
c1 c2 c3 c4 : proportionality "constants" - depend on 
b : number of bits per digit, log2(v): 1, 2, 4, 8
d : number of digits, e.g. for a 32 bit int: 32, 16, 8 or 4;
for a string d is average string length
n : number of items to sort
v : number of values per digit, e.g. for a 32 bit int: 2, 4, 16 or 256
: Depends on how sparse the data values are and how clustered.
High clustering requires fewer malloc's.
Table initialization time and space statistically depends on ratio (n/v).

```// swap: a[i] and a[j] (ignores associated data)
void inline swap(int *a, int i, int j) {
int s;
s = a[i];
a[i] = a[j];
a[j] = s;
} // swap```

### Bubble Sort

```void bubble1_sort(int num, int *a) {
// Stops when array is sorted, in best case one pass O(n).
int swapped, i;
num--;
do {
swapped = 0;
for(i=0; i<num; i++) {
if(a[i] > a[i+1]) { // compare
swap(a, i, i+1);
swapped = 1;
}
}
} while swapped;
} // bubble1_sort```

```void bubble2_sort(int num, int *a) {
int i, j;
for(i=0; i<num-1; i++) {
for(j=i+1; j<num; j++) {
if(a[i] > a[j]) { // compare
swap(a, i, j);
}
}
};
} // bubble2_sort```

### Cocktail Sort

Cocktail sort, also known as bidirectional bubble sort, cocktail shaker sort, shaker sort (which can also refer to a variant of selection sort), ripple sort, shuttle sort or happy hour sort, is a variation of bubble sort that is both a stable sorting algorithm and a comparison sort.
```void cocktail_sort(int num, int *a) {
int bottom, top, swapped, i;
bottom = 0;
top = num-1;
swapped = 1;
while(swapped) { // if no elements have been swapped, then the list is sorted
swapped = 0;
for(i=bottom; i<top; i++) {
if(a[i] > a[i + 1]) { // test whether the two elements are in the correct order
swap(a, i, i+1);
swapped = 1;
}
}
// decreases `top` because the element with the largest value in the unsorted
// part of the list is now on the position top
top--;
for(i=top; i>bottom; i--) {
if(a[i] < a[i-1]) {
swap(a, i, i-1);
swapped = 1;
}
}
// increases `bottom` because the element with the smallest value in the unsorted
// part of the list is now on the position bottom
bottom++;
}
} // cocktail_sort```

### Comb Sort

An in-place sort algorithm that repeatedly reorders different pairs of items. On each pass swap pairs of items separated by the increment or gap, if needed, and reduce the gap (divide it by about 1.3). The gap starts at about 3/4 of the number of items. Continue until the gap is 1 and a pass had no swaps. Bubble sort is a comb sort where the gap is always 1. Comb sort does a single "bubbling" pass over each set for each gap or increment, whereas Shell sort completely sorts each set.

```void comb_sort(int num, int *a) {
int gap, swapped, i;
gap = num>>1;
do {
swapped = 0;
//update the gap value for a next comb
gap *= 10/13;
if((10==gap) || (9==gap)) {
gap = 11;
}
if(gap < 1)
gap = 1;
// a single "comb" over the input list
for(i=0; i+gap<num; i++) {
if(a[i] > a[i+gap]) {
swap(a, i, i+gap);
swapped++;
}
}
} while((gap>1) || swapped);
} // comb_sort```

### Gnome Sort

```void gnome_sort(int num, int *a) {
int i, j;
i = 1;
j = 2;
while(i < num) {
if(a[i-1] <= a[i]) {
i = j++;
} else {
swap(a, i-1, i);
if(!--i)
i = 1;
}
}
} // gnome_sort```

### Heap Sort

Heap sort is slower than the other O(n*log(n)) sorting algorithms, but unlike merge and quick sorts it doesn't require massive recursion or multiple arrays to work. This makes it the most attractive option for very large data sets of millions of items.

It begins by building a heap out of the data set, and then removing the largest item and placing it at the end of the sorted array. After removing the largest item, it reconstructs the heap and removes the largest remaining item and places it in the next open position from the end of the sorted array. This is repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements.

To do an in-place sort and save the space the second array would require, the algorithm below uses the same array to store both the heap and the sorted array. Whenever an item is removed from the heap, it frees up a space at the end of the array that the removed item can be placed in.

```void heap_sort(int num, int *a) {
int i;
for(i=(num/2)-1; i>=0; i--)
heapify(num, a, i, num);
for(i=num-1; i>=1; i--) {
swap(a, 0, i);
heapify(num, a, 0, i-1);
}
} // heap_sort

void heapify(int num, int *a, int root, int bottom) {
int maxc;
while(root*2<=bottom) {
maxc = ((root*2 == bottom) || (a[root*2] >  a[root*2+1]))
? root*2 : root*2+1;
if(maxc >= num)
return;
if (a[root] < a[maxc]) {
swap(a, root, maxc);
root = maxc;
} else
return;
}
} // heapify```

### Insertion Sort

```void insertion_sort(int num, int *a) {
int i, j, v;
for(i=1; i<num; i++) {
v = a[i];
for(j=i; j>0; j--) {
if(a[j-1] <= v)
break;
a[j] = a[j-1];
}
a[j] = v;
}
} // insertion_sort```

### Merge Sort

The merge sort splits the list to be sorted into two equal halves, and places them in separate arrays. Each array is recursively sorted, and then merged back together to form the final sorted list. Like most recursive sorts, the merge sort has an algorithmic complexity of O(n log n).

Elementary implementations of the merge sort make use of three arrays - one for each half of the data set and one to store the sorted list in. The below algorithm merges the arrays in-place, so only two arrays are required. There are non-recursive versions of the merge sort, but they don't yield any significant performance enhancement over the recursive algorithm on most machines.

Although one of the fastest and stable sort algorithms it is undesirable because it needs another array of the same size as a scratch area.

```inline void merge_sort(int *a, int *t, int num) {
m_sort(a, t, 0, num-1);
} // merge_sort

void m_sort(int *a, int *t, int left, int right) {
int mid;
if(right > left) {
mid = (left+right)>>1;
m_sort(a, t, left, mid); // First half
m_sort(a, t, mid+1, right); // Second half
merge(a, t, left, mid+1, right);
}
} // m_sort

void merge(int *a, int *t, int left, int mid, int right) {
int i, left_end, num, tmp_pos;
left_end = mid - 1;
tmp_pos = left;
num = right - left;
while ((left <= left_end) && (mid <= right)) {
if (a[left] <= a[mid]) {
t[tmp_pos++] = a[left++];
} else {
t[tmp_pos++] = a[mid++];
}
}
while (left <<= left_end) {
t[tmp_pos++] = a[left++];
}
while (mid <= right) {
t[tmp_pos++] = a[mid++];
}
for (i=0; i <= num; i++)  {
a[right] = t[right];
right--;
}
} // merge

int main() {
int num, a[num], t[num];
// fill array a[] ...
merge_sort(a, t, num);   // Call to merge sort the array
// ... use sorted array
} // main```

### Quick Sort

The recursive algorithm consists of four steps (which closely resemble the merge sort):

1. If there are one or less elements in the array to be sorted, return immediately.
2. Pick an element in the array to serve as a "pivot" point. (Usually the left-most element in the array is used.)
3. Split the array into two parts - one with elements larger than the pivot and the other with elements smaller than the pivot.
4. Recursively repeat the algorithm for both halves of the original array.

The efficiency of the algorithm is majorly impacted by which element is choosen as the pivot point. The worst-case efficiency of the quick sort, O2), occurs when the list is sorted and the left-most element is chosen. Randomly choosing a pivot point rather than using the left-most element is recommended if the data to be sorted isn't random. As long as the pivot point is chosen randomly, the quick sort has an algorithmic complexity of O(n*log2(n)). The high degree of recursion can make quick sort a bad choice.

```int partition(int *a, int left, int right, int pivot) {
int pv, si;
pv = a[pivot];
swap(a, pivot, right) // Move pivot to end
si = left-1;
for(i=left; i<right; i++) {
if(array[i] <= pv) {
si++;
swap(a, si, i);
}
}
swap(a, right, si+1) // Move pivot to its final place
return si+1;
} // partition

void q_sort(int *a, int left, int right) {
int pvi, pni;
if(right > left) {
// select a pivot index (e.g. pivotIndex := left);
pni = partition(a, left, right, pvi);
q_sort(a, left, pni-1);
q_sort(a, pni+1, right);
}
} // q_sort

int *quick_sort(int num, int *a) {
q_sort(num, 0, num-1);
} // quick_sort```

Radix sort can work starting with either the most significant digit or it can start with the least significant digit. There is a greater complexity (and hence performance hit) if the length of items, i.e. the number of digits, is not constant (e.g. strings).

Radix sort a list of fixed-size numbers of length SIZE in O(n*SIZE) time by treating them as bit strings. The list is first sorted by the least significant bit while preserving their relative order using a stable sort. Then we sort them by the next bit, and so on from lsb to msb, and the list will end up sorted. For unsigned int SIZE is typically 32 (bits), for unsigned char SIZE is 8 (bits) or 256 (bytes).

```void radix (int num, int byte, int *src, int *dst) {
int count, index, i;
for(i=0; i<256; i++)
count[i] = 0;
for(i=0; i<num; i++ )
count[((src[i])>>(byte*8))&0xff]++;
index=0;
for(i=1; i<256; i++ )
index[i] = index[i-1]+count[i-1];
for(i=0; i<num; i++ )
dst[index[((src[i])>>(byte*8))&0xff]++] = src[i];

void radix_sort (int num, int *a) {
// assumes 4 bytes/int
int t;
t = malloc(num*sizeof(int));
free(t);

Initially the data structure is empty. This data structure will be a tree of tables of depth d (number of digfits) until the final digit is seen. As each value to be sorted (with associated data) comes in a tree is created starting with the first digit (i.e. least or most significant). A table is allocated for digits or digit values not yet seen (and initialized to 0). An entry is made for each digit actually looked at or seen.

The advantage of this method is best seen when a large range of sparse and higly clustered values occurs. When the data is sparse but not clustered (i.e. is uniformly distributed) a good hash insertion method is faster. When the data is not sparse, a good hash is faster if the hash table is big enough. When the data is highly clustered a hash does poorly if the table size is low requiring long overflow lists.

### Selection Sort

```void selection_sort(int num, int *a) {
// This variant of bubble sort minimizes swaps
int i, j, min;
for (i=0; i<num-1; i++) {
min = i;
for(j=i+1; j<num; j++) {
if(a[j] < a[min])
min = j;
}
if(min != i) // this test is not strictly necessary
swap(a, i, j);
}
} // selection_sort```

### Shell Sort

Invented by Donald Shell in 1959, the shell sort is the most efficient of the O(n2) class of sorting algorithms. Of course, the shell sort is also the most complex of the O(n2) algorithms.

The shell sort is a "diminishing increment sort", better known as a "comb sort" to the unwashed programming masses. The algorithm makes multiple passes through the list, and each time sorts a number of equally sized sets using the insertion sort. The size of the set to be sorted gets larger with each pass through the list, until the set consists of the entire list. (Note that as the size of the set increases, the number of sets to be sorted decreases.) This sets the insertion sort up for an almost-best case run each iteration with a complexity that approaches O(n).

The items contained in each set are not contiguous - rather, if there are i sets then a set is composed of every i-th element. For example, if there are 3 sets then the first set would contain the elements located at positions 1, 4, 7 and so on. The second set would contain the elements located at positions 2, 5, 8, and so on; while the third set would contain the items located at positions 3, 6, 9, and so on.

The size of the sets used for each iteration has a major impact on the efficiency of the sort. Several Heroes Of Computer ScienceTM, including Donald Knuth and Robert Sedgewick, have come up with more complicated versions of the shell sort that improve efficiency by carefully calculating the best sized sets to use for a given list.

```void shell_sort(int num, int *a) {
int i, j, incr, temp;
incr = num / 2;
while(incr > 0) {
for(i=incr; i < num; i++) {
j = i;
temp = a[i];
while((j >= incr) && (a[j-incr] > temp)) {
a[j] = a[j-incr];
j -= incr;
}
a[j] = temp;
}
incr = (2==incr) ? 1 : (int)(incr/2.2);
}
} // shell_sort```

2006-2011