Dijkstra, BST's

Dikstra Algorithnm:

Initiate:

X = {S} //Vertices processed so far
A[S] = //Contains shortest path distances.
B[S] = [has the shortest path] // just for understanding, not coding.

Consider two groups: X and V-X(X1)

MAIN LOOP:
while X!=V:
among all edges (v,w) with v in X and w in X1,
pick the one that minimizes A[v] +lvw
add w to X.
Set A[w]=A[v] + lvw.

Dijkstra doesn't work with negative path lengths.

Dijkstra running Time : O(mn).

For a super fast implementation- Use heaps.

HEAP IMPLEMENTATION:

Elements of Heap = Vertices of {V-X} vertices.
For ever V in {V-X}, key[v]=smallest value of all the paths that ends at V-X and starts at X.
If for a vertex v in {V-X}, if it doesnt have any incident edges on it which started at X, its key value is infinity.

Every iteration, we add an element from {V-X} to X. Consider new vertex K added to X. Update the key values of vertices with edges from K and still in {V-X} as they might be lower now.

Algorithm:

When w extracted from heap and added to X:
for each edge (w,v) where v in {V-X}:
delete v from heap.
key[v] = min {key[v],A[w] + wv}

-------------------------------------------------------------------------------------------------------------------------------------

HEAPS:

A container for objects which have keys. Supports mainly two functions. Insertion and extract min. (Both take time = O(LogN) , n= no of objects in heap).

Other Ops:
1) Heapify : To initialize a heap. We can call Insert n times. Or do it in batches, which takes linear time.
2) Delete (Any element): In logarithmic time.

When to use heap? Notice if your program uses repeated minimum computations.

Heap Sort: 1) Insert all elements in a heap. Extract min one by one. RunTime: O(NlogN).

Applications:
1) Can be used for event records with timestamps.
2) Can be used to calculate median after every entry of a new number.
-Use two heaps. First one is Hlow- A max heap. Second one is Hhigh- a min heap. Hlow will have all the smallest n/2 number of elements.
Hhigh will have the biggest n/2 number of elements. Hence, the root of Hlow and Hhigh will be the medians when n is even or one of
them will be median if n is odd, whichever has one more element.
- If the balance between Hlow and Hhigh is two, extract one element from the larger one and insert in the smaller one.

3) Can be used to speed up Dijkstra's algorithm:

Insertion : Insert into the next slot and bubble up.

Deletion: Delete first node. Replace that position with the last max value item. Restore heap property.

---------------------------------------------------------------------------------------------------------------------------------------------------------

BALANCED BINARY SEARCH TREES:

In a sorted array, we can implement a lot of operations in reasonable time. But this is only for static data. When the data is stored dynamically, INSERTION and DELETION become expensive. Hence, we need balanced binary search trees.

Hence, BBST = Sorted array + logarithmic time insertion and deletion.

OPERATION SORTED ARRAY BBST

Search                                                                  O(logN)                                            -
Select                                                                      O(1)                                            O(logN)
Min/Max                                                                 O(1)                                          O(logN)
Predeccessor/Successor                                      O(1)                                            O(logN)
Rank                                                                         O(logN)                                           -
Output in sorted order                                         O(N)                                               -
Insert                                                                                                                              O(logN)
Delete                                                                                                                              O(logN)

NOTE: Height of a tree determines how fast the operations are gonna be.

Height can vary between log(N) to N.

Operations:

Inorder Traversal: To print in sorted order. O(n) time.

Minimum of a tree: Left pointer continuously.
Maximum of a tree: Right pointer continuously.
Predecessor: Two cases:
1) Has left child : Find max element of it's left subtree. (Go left once and then keep going right)
2) Doesnt have left child: Keep going up until parent is smaller than the item.

Deletion of a key:
3 cases:
1) Easy Case: Node has no child.
2) Medium case: Node has one child. Attach that to the previous parent.
3)Difficult case: Node has two children.
              1) Compute k's predecessor. Will be easy cuz go left once, and then keep going right until you cant anymore. Name it l.
              2) Swap k and l.
              3) Delete new position of k. In case K has a left child (wont have right child cuz that would be the predecsoor otherwise), do Medium                      case. (Running time : O(height)).

SELECT AND RANK:
For this, we need to store some extra data. Let size( x node) = no of nodes of it's subtree +itself. size(x) =1+ size(left subtree) + size(right subtree)

ith Order Statistic: Start at root x with child y and z.
                 Let a = xize(y) (a =0 if x has no left child)
                 if a =i, return x,
                 if a>i, recurse left child
                 if a

Rank: Given a key, find how many keys are smaller or equal to this value. Similar to SELECT.

---------------------------------------------------------------------------------------------------------------------------------------------------

RED BLACK TREES:

1) Every node has one bit of information. Either red or black.
2) Root is always black.
3)No two reds in a row.
4)Every path taken from root to null pointer passes through exactly the same number of black nodes.

Use? In a red-black tree with N nodes, height is always <= 2log(N+1).

Hashing, Bloom Filters

Hash function: Does the mapping between the numbers and the required key.

Purpose: Maintain a possibly evolving set of stuff.
Operations: Insert, Delete, Lookup (All operations in O(1) time)

NOTE: Hash tables do not have ordering among its keys.
Hash tables speed up programs that require repeated lookups. Or for keeping track of stuff.

Applications:
1) Deduplication: Given a stream of objects, remove duplicates by looking up in a hash table.

2) The 2-SUM problem: Unsorted array of integers. Find two values with target sum T.
Sol: Insert elements in a hash table. For each X, lookup T-x in H.

Implementation:
High Level Idea- Consider Universe U (A REALLLY BIG SET).
Goal: Maintain a feasible evolving subset S of U.

Structure: Consider a array of size N and consider each spot to be a bucket. Hence we have N buckets. Now use a hash function. Collisions might occur for small datasets itself.

Resolving Collisions:
Chaining: Maintain a list in every bucket.
Open Addressing: No list, every bucket has one item. If the bucket is full, try another bucket. Keep trying until success.
Eg: Linear Probing, Double Hashing,(Second hash function will give the offset which will be added continuously to the first hash value to get an empty bucket)

Pro's and Con's: Chaining will take up extra space. Deletion is difficult in open addressing.

Good Hash Function: Should spread data evenly, be able to compute in constant time. Try to make it completely random.

Quick and dirty hash functions: Making simple hash functions which are useful. Basically, convert the main string into integers, then apply a compression function to fit them to the number of buckets.

How to choose number of buckets n?
1)Let n be a prime number.
2) Not too close to a power of 2 and 10.

Load factor of a hash table: (alpha) : # of objects in table/# of buckets

alpha = 1 is a necessary condition for the operations to run in constant time.

EVERY HASH FUNCTION HAS A PATHOLOGICAL DATA SET.

Solutions? 1) Cryptographic hash function 2) Use randomization over a family of hash functions.

-------------------------------------------------------------------------------------------------------------------------------------------------

BLOOM FILTER:

1) More space efficient over hash tables.
2) They allow for errors.
3) Doesnt remember(store) the object itself, just remembers whether it has seen it or not.
4) Basic ones have no deletions.
5) Possible mistakes. Flase positives. Despite not having inserted before, filter will say its there.

Applications: Early spell checkers,List of forbidden passwords, network routers,

Ingredients: Array and hash functions. Array only has 0 or 1.

Consider object . We use k hash functions. Run each hash function with the object to get an index and mark the value with that index as 1, regardless of its previous value. This is for insertion. For look-up, run the hash functions again and check the values of the corresponding posititons.

K =(log2).b (b=bits per object)

Dijkstra, BST's

Hashing, Bloom Filters

Next up

Previous

Dijkstra, BST's