Algorithms Part 1 - Stanford University Public

Algorithms Part 1 - Stanford University

Rahul Guin
Course by Rahul Guin, updated more than 1 year ago Contributors

Description

Part 1

Module Information

No tags specified
Communication - Shortest Path Algorithms. Cryptography - Number-Theoretic Algorithms. Computer Graphics-  Geometric Algorithms. Database Indices - Balanced Search tree data structures. Computational biology - Dynamic Programming. ----------------------------------------------------------------------------------------------------------- KARATSUBA MULTIPLICATION. Divide Number into two  parts, first part greater by 1 if no of digits is odd. Ie: x*y: x=10^(n/2)a+b, y=10^(n/2)c+d. Therefore: x*y = 10^(n)(ac) + 10^(n/2)(ab+bc) + bd. Three recursive calls. For the middle call, compute recursively (a+b)(c+d) and subtract ac and bd computed earlier. The way we calculate time in algorithms is not by how they perform, but by how they scale with larger inputs. Randomized Algorithms- Difference outputs over same inputs. ---------------------------------------------------------------------------------------------------------- MERGESORT ALgorithm: Two recursive calls to first and second half of array to sort each. Merge both arrays in ascending order. Claim : Merge sort takes <= 6nlg(n) + 6n. Proof: No of work done in each level = 6n No of levels = lg(n) + 1. Multiply. Imagine Log as lg(n) as the number of times we need to divide n by the base(in this case, 2) to reach 1.   Recursive Tree Method. Hence: Total depth of recursion tree in merge sort : lg(n)+1. No. of subproblems existing at level j= 2^j. Array size at level j = n/2^j. Asymptotic Analysis : Supress constant factors and lower-order terms. Formal Definition :  T(n) < = cf(n) where n >= n0. n0 represesnts sufficiently large and c represents the constant. ------------------------------------------------------------------------------------------------------------ DIVIDE AND CONQUER PARADIGM: Divide. Conquervia recursive calls. Combine solutions of subproblems Inversions in an array : ia[j]; Used to see how different two arrays are from each other based on ranking. Inversion Algo: sort_and_count( Array A, int n){ if (n==1) return 0; else int x= sort_and_count(first half of A, n/2); int y=sort_and_count(2nd half of A, n/2); int z= merge_and_countSplitInv(A,n); return x+y+z;  NOTE: If there are no split inversions, every element in left array B is smaller than every element in right array C. The split inversions involving an element Y in C is exactly the number of elements left in B when Y is copied to output array D. Hence we piggyback on mergeSort for inversion, hence running time is same as merge sort. --------------------------------------------------------------------------------------------------------------------------------- STRASSEN'S MATRIX MULTIPLICATION METHOD: Have done it in class before. Check online. Very Straightforward. ---------------------------------------------------------------------------------------------------------------------------------
Show less
No tags specified
QUICKSORT: Pick pivot. All left of pivot should be less than the pivot, same for the right. Partitioning can be done in O(n) time. Reduces problem size. Doesn't need extra memory. Choose pivot element. Rearrange such that everything left of it is smaller. PARTITION SUBROUTINE: If not element, chose another element and swap with the first element for the pivot. Divide into two- what we have seen and what we haven't seen. First part will be partitioned the right way. ie: Everything smaller than the pivot will be left of the pivot. j - Boundary point between what we have seen and what we haven't. i - Boundary between less than pivot and more than pivot. Hence, it points to the last element smaller than pivot. QUICKSORT (Array, N){ if n=1 return; P = choose_Pivot(A,n); Partition A around p; Recursively call first part; Recursively call second part. } Running time depends on how well the pivot element is chosen.  NOTE: If a sorted array is passed to QuickSort, the running time will be O(N^2). For every array size n for QuickSort, the average running time will be O(NlogN). Best case would be to choose the median everytime, but a 25-75 split is good enough to get O(nlogn) time.   ----------------------------------------------------------- MASTER THEOREM: Recursive Algo: Base case, Work done in recursive call, work done outside the recursive call. Assumptions : All subproblems have the same size. n is a power of b.   Master Format: T(N) <= aT(N/b) + O(N^d) or T(N) <= aT(N/b) + cN^d. a-> No of recursive calls. or RATE OF SUBPROBLEM PROLIFERATION (RSP). (Evil) b-> input size shrinkage factor  d->Exponent of work done outside recursive calls. b^d -> RATE OF WORK SHRINKAGE (RWS). (Good) T(N) <= cN^d[a/b^d]^j. (Use merge sort to come to this conclusion) ie:Total work <= cn^d(Work outside recursion) * (sigma from 0 to logba)(a/b^d)^j. RSP = RWS : Same amount of work done at each level. Like merge sort. Hence, expect O(nlogn).(Because in the equation, a/b^d =1) RSP < RWS: Less and less work with every level. Hence, max work is done at root. Hence, we consider only root time as it will dominate over the rest. Expect O(n^d). RSP > RWS: More work done with each level. hence, max work is at leaves. Then leaves time will dominate. Hence, consider only that. Depth of leaves will be O(log(base b)N). a^(log(base b)N) = No of leaves in a recursion tree.
Show less
No tags specified
Nth order statistic: The nth smallest element in the array. Median : (n+1)/2 -> N is odd.                  (n/2) -> N is even. Randomized Selection( Array A, int N, order statistic i){    if n=1 return A[i];    Choose pivot p from A uniformly at random.    Partition A around p and let j be the index of p.    if j=i, return p;    if (j>i) RSelect(first half of A,j-1,i);    if(j } Average running time : O(n). Expected running time: O(nlogn) Worst Cse:O(n^2) ANALYSIS: RSelect is in phase j if the array size is between (0.75)^(j+1)N and (0.75)^(j)N. ---------------------------------------------------------------------------------------------------------------------------------------------------------------- GRAPHS: Remember the basic notations, V- vertices, E edges.  Parallel Edge: More than one direct connection between two nodes. Connected: In one piece. Fewest No of edges in a connected non-parallel graph: V-1. Max                                                                                        : V(V-1)/2 MIN CUT PROBLEM: Cut a graph in two pieces such that they have the minimum number of crossing edges. Basic Notation: N- no of vertices, M- no of edges. SPARSH VS DENSE GRAPH: In sparse, M is O(N) or close to it. In Dense, M is O(N^2) or close to it. ADJACENCY LISTS VS MATRICES: Usage depends on the graph density. Ad Matrix Space : O(N^2). Ad List Space: O(m+n) RANDOM CONTRACTION ALGORITHM: While there are more than two vertices:    Pick a remaining edge (u,v) at random.    Merge u and v into a single vertex.    Remove self loops. Return cut represented by the final two vertices. Conditions: Let\s say we want output as supernodes A and B with the interconnecting edges represented as F. We will only get the desired output if any edge from F is never contracted.
Show less
No tags specified
GRAPH CONNECTIVITY. The goal of Searching in a graph: O(m+n) time. Generic Search Algorithm (Given graph G, initial vertex S){ Initially, S explored, all other vertexes unexplored. While possible:    Choose an edge(u,v) with one explored and one unexplored.    Mark the new vertex explored. BFS: (Explores nodes in layers, Uses Queue) Can compute shortest paths. Can compute connected components of an undirected graph. DFS:(Explores aggressively, uses stack) Compute topological ordering or a directed acyclic graph. Can compute connected components of a directed graph. BFS:(Graph G, start vertex S):       Mark S explored.       Q= queue, initialized with S.       While(Q!=null):          Remove the first node of Q, call it v.             For each edge(v,w):             Mark w explored; add to Queue   COMPUTE SHORTEST PATH: Use BFS only, add extra code: Initialize dist(V) : 0 if V==s, infinite if V!=S. When considering edge (v,w) : if W is unexplored, dist(w) = dist(v) +1;   UNDIRECTED CONNECTIVITY: To compute all components:    All nodes unexplored.    for i =1 to n (n number of nodes):       if (i not explored): BFS(G,i); Running Time : O(m+n). ---------------------------------------------------------------------------------------------------- DFS Algo: mimics BFS, uses a stack; Can do the whole thing recursively too. Try it out. DFS(Graph G , vertex S):    Mark S explored.    For every edge (S,v):       if V unepxlored:          DFS(G,v). %f(v) = current_label--;% (F   rom topological sort)   ------------------------------------------------------------------------------------------------------------ Topological Ordering:    Let V be the sink vertex of graph G.    F(V) = n.    Delete that node and recurse on G-{v} A topological ordering of a directed graph is a labelling F of G's nodes such that: F(v) belongs to the set of vertexes. (u,v) -> f(u) If G is cyclic, there can be no topological sort. Every directed acyclic graph has a sink vertex. DFS_LOOP(GRAPH G):    mark all nodes unexplored.    current_label = n,    for each vertex v in G:       if v is not explored:          DFS(G,v). --------------------------------------------------------------------------------------------------------- STRONGLY CONNECTED COMPONENTS: A graph is strongly connected if you can get from any one point to any other point. Problem: We can get an SCC only if we start from the right node.   KOSARAJU'S TWO PASS ALGORITHM: 1) Reverse all the paths of the graph. Represented by G^(rev). 2) Run DFS-LOOP n the reverse graph from the highest node to the lowest. (It will compute a magical ordering of nodes).       -Let f(v) be the finishing time of each vertex. 3) Run DFS-LOOP on the original graph. (Once it goes through the magical prdering, it will reveal the SCC's one by one).       -SCC's - All nodes which have the same leader. DFS Loop(Graph G):    global variable t = 0 (# of nodes so far) (For first pass).    global variable s = NULL (most recent source vertex) (For second pass).    Assume nodes labelled 1 to n.    for i =1 to n:       if i unexplored:          S:=i;          DFS(G,i); DFS(Graph G, vertex i):    mark i as explored.    set leader(i) = s; //second pass    for (each arc(i,j)):       if j unexplored.       DFS(G,j);    t++;   //first pass    set (f(i) = t;   //first pass   SCC's can have complex structures but if we zoom out and treat every SCC as a single node, they have a simple directed acyclic graph. If SCC1 has an outer edge to SCC2, then max F(SCC2) > max F(SCC1). Leader is the same as the highest finishing time for that SCC. -------------------------------------------------------------------------------------------------------------------------------------------------------------------- STRUCTURE OF THE WEB (OPTIONAL): Vertices: Web Pages Directed edges: Hyperlinks
Show less
No tags specified
Dikstra Algorithnm: Initiate: X = {S} //Vertices processed so far A[S] = //Contains shortest path distances. B[S] = [has the shortest path] // just for understanding, not coding. Consider two groups: X and V-X(X1) MAIN LOOP:    while X!=V:       among all edges (v,w) with v in X and w in X1,       pick the one that minimizes A[v] +lvw       add w to X.       Set A[w]=A[v] + lvw. Dijkstra doesn't work with negative path lengths.  Dijkstra running Time : O(mn). For a super fast implementation- Use heaps. HEAP IMPLEMENTATION: Elements of Heap = Vertices of {V-X} vertices. For ever V in {V-X}, key[v]=smallest value of all the paths that ends at V-X and starts at X. If for a vertex v in {V-X}, if it doesnt have any incident edges on it which started at X, its key value is infinity.  Every iteration, we add an element from {V-X} to X. Consider new vertex K added to X. Update the key values of vertices with edges from K and still in {V-X} as they might be lower now.   Algorithm: When w extracted from heap and added to X:    for each edge (w,v) where v in {V-X}:       delete v from heap.       key[v] = min {key[v],A[w] + wv} ------------------------------------------------------------------------------------------------------------------------------------- HEAPS:  A container for objects which have keys. Supports mainly two functions. Insertion and extract min. (Both take time = O(LogN) , n= no of objects in heap). Other Ops: 1) Heapify : To initialize a heap. We can call Insert n times. Or do it in batches, which takes linear time. 2) Delete (Any element): In logarithmic time. When to use heap? Notice if your program uses repeated minimum computations.     Heap Sort: 1) Insert all elements in a heap. Extract min one by one. RunTime: O(NlogN). Applications: 1) Can be used for event records with timestamps. 2) Can be used to calculate median after every entry of a new number.       -Use two heaps. First one is Hlow- A max heap. Second one is Hhigh- a min heap. Hlow will have all the smallest n/2 number of elements.         Hhigh will have the biggest n/2 number of elements. Hence, the root of Hlow and Hhigh will be the medians when n is even or one of          them will be median if n is odd, whichever has one more element.    - If the balance between Hlow and Hhigh is two, extract one element from the larger one and insert in the smaller one. 3) Can be used to speed up Dijkstra's algorithm: Insertion :  Insert into the next slot and bubble up. Deletion: Delete first node. Replace that position with the last max value item. Restore heap property. --------------------------------------------------------------------------------------------------------------------------------------------------------- BALANCED BINARY SEARCH TREES:  In a sorted array, we can implement a lot of operations in reasonable time. But this is only for static data. When the data is stored dynamically, INSERTION and DELETION become expensive. Hence, we need balanced binary search trees. Hence, BBST = Sorted array + logarithmic time insertion and deletion. OPERATION                                                   SORTED ARRAY                              BBST Search                                                                   O(logN)                                            -  Select                                                                      O(1)                                            O(logN)   Min/Max                                                                  O(1)                                          O(logN)   Predeccessor/Successor                                      O(1)                                            O(logN)      Rank                                                                         O(logN)                                            - Output in sorted order                                          O(N)                                                - Insert                                                                                                                              O(logN)   Delete                                                                                                                              O(logN)       NOTE: Height of a tree determines how fast the operations are gonna be. Height can vary between log(N) to N.   Operations: Inorder Traversal: To print in sorted order. O(n) time. Minimum of a tree: Left pointer continuously. Maximum of a tree: Right pointer continuously. Predecessor:  Two cases:                            1) Has left child : Find max element of it's left subtree. (Go left once and then keep going right)                            2) Doesnt have left child: Keep going up until parent is smaller than the item.   Deletion of a key: 3 cases:    1) Easy Case: Node has no child.    2) Medium case: Node has one child. Attach that to the previous parent.    3)Difficult case: Node has two children.                 1) Compute k's predecessor. Will be easy cuz go left once, and then keep going right until you cant anymore. Name it l.                2) Swap k and l.                3) Delete new position of k. In case K has a left child (wont have right child cuz that would be the predecsoor otherwise), do Medium                      case. (Running time : O(height)). SELECT AND RANK:  For this, we need to store some extra data. Let size( x node) = no of nodes of it's subtree +itself. size(x) =1+ size(left subtree) + size(right subtree) ith Order Statistic: Start at root x with child y and z.                   Let a = xize(y) (a =0 if x has no left child)                   if a =i, return x,                   if a>i, recurse left child                   if a Rank: Given a key, find how many keys are smaller or equal to this value. Similar to SELECT. --------------------------------------------------------------------------------------------------------------------------------------------------- RED BLACK TREES: 1) Every node has one bit of information. Either red or black. 2) Root is always black. 3)No two reds in a row. 4)Every path taken from root to null pointer passes through exactly the same number of black nodes.  Use? In a red-black tree with N nodes, height is always <= 2log(N+1).
Show less
No tags specified
Hash function: Does the mapping between the numbers and the required key. Purpose: Maintain a possibly evolving set of stuff. Operations: Insert, Delete, Lookup (All operations in O(1) time) NOTE: Hash tables do not have ordering among its keys.              Hash tables speed up programs that require repeated lookups. Or for keeping track of stuff. Applications:  1) Deduplication: Given a stream of objects, remove duplicates by looking up in a hash table. 2) The 2-SUM problem: Unsorted array of integers. Find two values with target sum T.    Sol: Insert elements in a hash table. For each X, lookup T-x in H.   Implementation: High Level Idea- Consider Universe U (A REALLLY BIG SET). Goal: Maintain a feasible evolving subset S of U. Structure: Consider a array of size N and consider each spot to be a bucket. Hence we have N buckets. Now use a hash function. Collisions might occur for small datasets itself. Resolving Collisions: Chaining: Maintain a list in every bucket. Open Addressing: No list, every bucket has one item. If the bucket is full, try another bucket. Keep trying until success.                               Eg: Linear Probing, Double Hashing,(Second hash function will give the offset which will be added continuously to the first hash                                                                                                 value to get an empty bucket) Pro's and Con's: Chaining will take up extra space. Deletion is difficult in open addressing. Good Hash Function: Should spread data evenly, be able to compute in constant time. Try to make it completely random. Quick and dirty hash functions: Making simple hash functions which are useful. Basically, convert the main string into integers, then apply a compression function to fit them to the number of buckets. How to choose number of buckets n? 1)Let n be a prime number.  2) Not too close to a power of 2 and 10. Load factor of a hash table: (alpha) : # of objects in table/# of buckets alpha = 1 is a necessary condition for the operations to run in constant time. EVERY HASH FUNCTION HAS A PATHOLOGICAL DATA SET.  Solutions? 1) Cryptographic hash function 2) Use randomization over a family of hash functions. ------------------------------------------------------------------------------------------------------------------------------------------------- BLOOM FILTER: 1) More space efficient over hash tables. 2) They allow for errors. 3) Doesnt remember(store) the object itself, just remembers whether it has seen it or not. 4) Basic ones have no deletions. 5) Possible mistakes. Flase positives. Despite not having inserted before, filter will say its there. Applications: Early spell checkers,List of forbidden passwords, network routers, Ingredients: Array and hash functions. Array only has 0 or 1. Consider object . We use k hash functions. Run each hash function with the object to get an index and mark the value with that index as 1, regardless of its previous value. This is for insertion. For look-up, run the hash functions again and check the values of the corresponding posititons. K =(log2).b (b=bits per object)
Show less
Show full summary Hide full summary