taught by Grzegorz Malewicz
How to design an algorithm that solves a computational problem? What techniques can help you discover an efficient solution? The Introduction to Computer Algorithms course will present a range of techniques that algorithm designers can use when trying to develop an efficient algorithm for a computational problem. The techniques will be exemplified on many common computational problems. You will improve your understanding of course material by solving theoretical problems and implementing algorithms on a computer. The course will be fairly rigorous (i.e., theorems will be proven).
Lecture notesSee the compiled 100 pages of algorithms lecture notes.


List of topicsThe text in square brackets refers to chapters and sections of the required textbook Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms (2nd Edition). MIT Press (2001) book web pageFoundations. Introduction [Ch 1]. Asymptotic notation [Sec 3.1]. Recurrences and how they arise in analysis of divideandconquer algorithms. Methods for solving recurrences: substitution [Sec 4.1], recursiontree [Sec 4.2] and master theorem [Sec 4.3], along with examples for how methods are applied. Sorting. Insertion sort as an example of the technique of iterative improvement. The loop invariant technique for proving correctness of algorithms, and a proof that Insertion Sort solves the sorting problem [Sec 2.1]. Analysis of worst case running time of Insertion Sort [Sec 2.2]. Mergesort as an example of the divideandconquer technique for designing algorithms. The Merge procedure and the proof of its correctness using the loop invariant technique. Proof that Mergesort sorts as an example of an application of the technique of mathematical induction for proving correctness of divideandconquer algorithms [Sec 2.3.1]. An analysis that worstcase running time of MergeSort is O(n log n) using the substitution method [Sec 2.3.2]. A proof of correctness of the Partition function used by the Quicksort algorithm [Sec 7.1], intuition for averagecase running time of O(n log n) of Quicksort, proof of worstcase running time of O(n^2) [Sec 7.2]. The concept of a randomized algorithm. Randomized Quicksort as a way to create hope that subsegments produced by Partition are typically wellbalanced, and that the expected running time of Quicksort is O(n log n) on any input [Sec 7.3]. Analysis of expected running time of O(n log n) of Randomized Quicksort as an example of analysis of randomized algorithms and an application of linearity of expectation [Sec 7.4]. A lower bound of Omega(n log n) on worstcase sorting time of a deterministic comparison sort, as an example that sometimes it is impossible to design a quite fast algorithm [Sec 8.1]. How to beat the lower bound? Countingsort that sorts in time O(n+k) and space O(n+k) but assumes that keys are from {0,...,k} and does not compare them, but reads their values [Sec 8.2]. Radixsort that can sort larger numbers in time O(n), for example when numbers have a constant number of digits and each digit is in {0,...,n} [Sec 8.3]. Dynamic Programming. The notion of an optimization problem. Assemblyline scheduling as an example of such problem [Sec 15.1]. A structure of an optimal solution to the problem, where a route through the factory with the shortest routing time can be composed by extending a shortest route through an earlier station. How this structure gives rise to a recurrence, and a proof that the recurrence is correct. A method of solving the recurrence in time O(n) by observing that the recurrence "recomputes" solutions to subproblems, which can be stored in an array to avoid costly recomputing. Matrixchain multiplication as an example of an optimization problem with "richer" structure of optimal solution [Sec 15.2]. Proof that an optimal solution can be composed of optimal solutions to two subproblems selected from a wide range of possibilities. Proof that a recurrence on minimal cost is correct using the "cutandpaste" argument and reasoning by contradiction. A way to sequence the calculation of best costs for subchains so that whenever a cost needs to be calculate required costs have already been calculated. An algorithm with running time O(n^{3}). The longest common substring problem [Sec 15.4], its optimal structure, recurrence, and how to order the evaluation of the recurrence so as not to recompute values, yielding an algorithm with running time O(mn). Graph Algorithms and Greediness. Basic definitions from graph theory [Sec B.4 and B.5]. Adjacencylist and adjacencymatrix representations of a graph [Sec 22.1]. The Breathfirst search (BFS) graph traversal algorithm [Sec 22.2]. A proof that its running time is O(V+E) using the technique of aggregate counting. A proof that BFS finds the distance from a given vertex to any vertex in the graph. A shortest path to a vertex can be found by following the "parent pointers"; the pointers form a tree. The Depthfirst search (DFS) graph traversal algorithm [Sec 22.3]. Proofs of its running time of Theta(V+E), nested structure of "discovery" and "finish" times (the parenthesis theorem), reachability of an ancestor along a "white" path (the white path theorem), classification of edges and a way to determine "back" edges (the back edge lemma). Directed graph as a model of tasks and their dependencies. The notion of a topological sort of a directed graph, and a DFSbased algorithm for producing a topological sort of a directed acyclic graph [Sec 22.4]. The Minimum Spanning Tree problem [Sec 23.1] and the "safe edge" lemma that show how to "grow" a tree in a "greedy way". The Prim's algorithm that constructs a MST in O(E×logE) as an application of the safe edge lemma [Sec 23.2]. 

There was a competition among students for the fastest program that sorts a
large file. Programs submitted by students were tested on three files of different
sizes: 120KB, 1.5MB and 31.5MB. The table contains running times in seconds on
these files from the smallest time to the largest, omitting results for programs
that crashed or run seemingly too long. Due to privacy concerns, identity of
participants is not revealed. The fastest program on a file is not necessarily the
fastest program on a different file. 

