Introduction to Computer Algorithms

Fall 2004

taught by Grzegorz Malewicz

How to design an algorithm that solves a computational problem? What techniques can help you discover an efficient solution? The Introduction to Computer Algorithms course will present a range of techniques that algorithm designers can use when trying to develop an efficient algorithm for a computational problem. The techniques will be exemplified on many common computational problems. You will improve your understanding of course material by solving theoretical problems and implementing algorithms on a computer. The course will be fairly rigorous (i.e., theorems will be proven).


Course syllabus
Detailed list of topics covered
Deliverables
Results of a student competition

Course content

 

Lecture notes

See the compiled 100 pages of algorithms lecture notes.

 

 

List of topics

The text in square brackets refers to chapters and sections of the required textbook Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms (2nd Edition). MIT Press (2001) book web page

Foundations. Introduction [Ch 1]. Asymptotic notation [Sec 3.1]. Recurrences and how they arise in analysis of divide-and-conquer algorithms. Methods for solving recurrences: substitution [Sec 4.1], recursion-tree [Sec 4.2] and master theorem [Sec 4.3], along with examples for how methods are applied.

Sorting. Insertion sort as an example of the technique of iterative improvement. The loop invariant technique for proving correctness of algorithms, and a proof that Insertion Sort solves the sorting problem [Sec 2.1]. Analysis of worst case running time of Insertion Sort [Sec 2.2]. Merge-sort as an example of the divide-and-conquer technique for designing algorithms. The Merge procedure and the proof of its correctness using the loop invariant technique. Proof that Merge-sort sorts as an example of an application of the technique of mathematical induction for proving correctness of divide-and-conquer algorithms [Sec 2.3.1]. An analysis that worst-case running time of Merge-Sort is O(n log n) using the substitution method [Sec 2.3.2]. A proof of correctness of the Partition function used by the Quicksort algorithm [Sec 7.1], intuition for average-case running time of O(n log n) of Quicksort, proof of worst-case running time of O(n^2) [Sec 7.2]. The concept of a randomized algorithm. Randomized Quicksort as a way to create hope that subsegments produced by Partition are typically well-balanced, and that the expected running time of Quicksort is O(n log n) on any input [Sec 7.3]. Analysis of expected running time of O(n log n) of Randomized Quicksort as an example of analysis of randomized algorithms and an application of linearity of expectation [Sec 7.4]. A lower bound of Omega(n log n) on worst-case sorting time of a deterministic comparison sort, as an example that sometimes it is impossible to design a quite fast algorithm [Sec 8.1]. How to beat the lower bound? Countingsort that sorts in time O(n+k) and space O(n+k) but assumes that keys are from {0,...,k} and does not compare them, but reads their values [Sec 8.2]. Radixsort that can sort larger numbers in time O(n), for example when numbers have a constant number of digits and each digit is in {0,...,n} [Sec 8.3].

Dynamic Programming. The notion of an optimization problem. Assembly-line scheduling as an example of such problem [Sec 15.1]. A structure of an optimal solution to the problem, where a route through the factory with the shortest routing time can be composed by extending a shortest route through an earlier station. How this structure gives rise to a recurrence, and a proof that the recurrence is correct. A method of solving the recurrence in time O(n) by observing that the recurrence "recomputes" solutions to subproblems, which can be stored in an array to avoid costly recomputing. Matrix-chain multiplication as an example of an optimization problem with "richer" structure of optimal solution [Sec 15.2]. Proof that an optimal solution can be composed of optimal solutions to two subproblems selected from a wide range of possibilities. Proof that a recurrence on minimal cost is correct using the "cut-and-paste" argument and reasoning by contradiction. A way to sequence the calculation of best costs for subchains so that whenever a cost needs to be calculate required costs have already been calculated. An algorithm with running time O(n3). The longest common substring problem [Sec 15.4], its optimal structure, recurrence, and how to order the evaluation of the recurrence so as not to recompute values, yielding an algorithm with running time O(mn).

Graph Algorithms and Greediness. Basic definitions from graph theory [Sec B.4 and B.5]. Adjacency-list and adjacency-matrix representations of a graph [Sec 22.1]. The Breath-first search (BFS) graph traversal algorithm [Sec 22.2]. A proof that its running time is O(|V|+|E|) using the technique of aggregate counting. A proof that BFS finds the distance from a given vertex to any vertex in the graph. A shortest path to a vertex can be found by following the "parent pointers"; the pointers form a tree. The Depth-first search (DFS) graph traversal algorithm [Sec 22.3]. Proofs of its running time of Theta(|V|+|E|), nested structure of "discovery" and "finish" times (the parenthesis theorem), reachability of an ancestor along a "white" path (the white path theorem), classification of edges and a way to determine "back" edges (the back edge lemma). Directed graph as a model of tasks and their dependencies. The notion of a topological sort of a directed graph, and a DFS-based algorithm for producing a topological sort of a directed acyclic graph [Sec 22.4]. The Minimum Spanning Tree problem [Sec 23.1] and the "safe edge" lemma that show how to "grow" a tree in a "greedy way". The Prim's algorithm that constructs a MST in O(|E|log|E|) as an application of the safe edge lemma [Sec 23.2].

 

Deliverables

 
Date out Problem statement Due date
8/30/04 H01.pdf 9/7/04
9/10/04 H02.pdf 9/17/04
9/20/04 H03.pdf 9/27/04
9/26/04 H04.pdf 10/1/04
10/4/04 01_midterm.pdf  
10/8/04 H05.pdf 10/15/04
10/18/04 H06.pdf 11/8/04
10/27/04 H07.pdf 11/3/04
11/5/04 H08.pdf 11/10/04
11/12/04 02_midterm.pdf  
11/22/04 H09.pdf 11/30/04
12/6/04 H10.pdf 12/10/04
12/13/04 01_final.pdf  

 

Competition results

  There was a competition among students for the fastest program that sorts a large file. Programs submitted by students were tested on three files of different sizes: 120KB, 1.5MB and 31.5MB. The table contains running times in seconds on these files from the smallest time to the largest, omitting results for programs that crashed or run seemingly too long. Due to privacy concerns, identity of participants is not revealed. The fastest program on a file is not necessarily the fastest program on a different file.

 
small.txt (120KB) med.txt (1.5MB) big.txt (31.5MB)
1
1
1
1
1
5
5
5
6
7
10
13
15
40
58
240
1
11
28
45
71
82
187
189
563
629
1000
 
10
1182
2370