Algorithms - Ishan

Everything I learn in my undergraduate course on algos. There is only one final exam for this course so that is essentially what you're preparing for throughout the semester. I have heard a lot of bad things about this exam, so I must try my best 🫡 # Divide and Conquer We did not start off with this. I started taking the course seriously when we began covering this in class. Before this, we covered greedy algorithms, prim's, kruskal's, dijsktra's GS algorithm, etc. - Divide input into several parts, solve problem in each part recursively, combine into an overall solution - Analysis of runtime involves solving a recurrence relation that bounds runtime recursively in terms of running time on smaller instances - Works well when you combine with other algorithm design techniques (Dynamic Programming, Randomization, etc.) - Difference between brute force and improved solution involving DnC will not always be exponential vs. polynomial > Divide the input into two pieces of equal size; solve the two subproblems on these pieces separately by recursion; and then combine the two results into an overall solution, spending only linear time for the initial division and final recombination For any algorithm that fits the above pattern, we can let $T(n)$ denote the worst case running time on input instances of size n. If we consider n to be even, algorithm will spend $O(n)$ time to divide the input into two pieces of size $\frac{n}{2}$ and it then spends $T(\frac{n}{2})$ time to solve each one. And it then spend $O(n)$ time to combine the solutions from the two recursive calls. We have the following recurrence relation: $T(n) \leq 2T(\frac{n}{2})+cn$ when $n > 2$ and $T(2) \leq c$ > The design of an efficient divide and conquer algorithm is heavily intertwined with an understanding of how a recurrence relation determines a running time. To obtain an explicit asymptotic bound for a recurrence relation, $T(n)$ must only appear on the left hand side of the inequality. It is worth knowing how to solve recurrence relations for these reasons. > Any function $T(.)$ satisfying the above recurrence relation is bounded by $O(n\log{n})$. We arrived at this bound by one of two methods: unrolling and strong induction. I won't write it down here. Too tedious, much faster to write all of that logic on paper. > We can use the expression $T(n) \leq 2T(\frac{n}{2})+cn$ and replace 2 with $q$ to explore more generalized conditions. (q > 2, q = 1, q = 2) > When q = 1, we have a linear running time: $O(n)$ (most of the work is being done at top of the recursion) > When q = 2, we have equal amount of time being done across all levels: $O(n\log{n}$) > When q > 2, we have most of the work being done at the bottom of the recursion ### Breaking down QuickSelect's recurrence relation https://chat.openai.com/share/0376c5d1-0ee6-463d-a216-5eddb9eea729 ### Inversion Count - Read textbook and develop understanding from first principles, and think what would happen when the condition changes. Like, what if it is 2j or 3j, how does the algorithm become more complicated? - The check just becomes explicit. Like if j > i, you know that j > 0...i-1. But if you change it to 2j, that same check doesn't remain so simple. You have to check each one. # Spring break prep - Graphs - Trees - BFS - "Flooding" the graph - DFS - Going all the way - Digraphs - Edges with direction - Strong connectivity - u -> v, v -> u. Graph is strongly connected when every pair of nodes is _mutually reachable_. - Strongly connected component - Strong component containing a node s in a digraph to be the set of all v such that s and v are mutually reachable - Kosaraju's algorithm for finding all SCCs in a digraph - Populate a stack S on graph G by running DFS - Build $G^{REV}$ for $G$, and set all nodes to unvisited - While S is not empty, pop node v from S and run DFS on $G^{REV}$ from v to extract an SCC - DAGs - Directed graphs with no cycle - If G has a topological ordering, it is a DAG - If G is a DAG, it has a topological ordering - Key property to prove here is that in every DAG G, there is a node v with no incoming edges - Greedy - Proof techniques - Stays Ahead - Greedy algo "stays ahead" of the optimal algo at each step and is therefore optimal - Exchange Argument - Considers any possible solution to the problem and gradually transforms it into the solution found by the greedy algorithm without hurting its quality - Well known applications - Shortest path in graph - Minimum spanning tree problem - Constructing Huffman codes for data compression - Interval scheduling ### Network Flow - https://www.youtube.com/watch?v=oHy3ddI9X3o (video for Max Flow Min Cut)**** - Question 10 from Kleinberg and Tardos - https://ac.informatik.uni-freiburg.de/teaching/ws15_16/algo1516/solutions/solution08.pdf - ChatGPT explanation - https://chat.openai.com/share/e985e222-683a-4031-ad09-5b9c3238d66c - Pretty good, the second one did it for me. Now I understand. I struggled a lot with this question. - Edmonds-Karp algorithm - Ford-Fulkerson method - Max flow is min cut - Node demands, lower bounds ### Max-Flow Min-Cut Theorem Basics: The **max-flow min-cut theorem** is a fundamental result in network flow theory. It states two key things: 1. **The value of the maximum flow in a network is equal to the capacity of the minimum cut of the network.** 2. **No flow can exceed the capacity of a minimum cut.** ### Understanding the Components: **Max Flow:** - This is the greatest amount of flow that can possibly be pushed from the source node to the sink node in a flow network. The flow must satisfy certain constraints: it cannot exceed the capacities of the edges, and the inflow and outflow of every node (except the source and sink) must be equal. **Min Cut:** - A cut in a network is a partition of the vertices into two disjoint subsets such that one contains the source and the other contains the sink. The cut-set of a cut is the set of edges that have one endpoint in each subset of the partition. The capacity of a cut is the sum of the capacities of the edges in the cut-set. - The "minimum cut" is the cut whose cut-set has the minimum total capacity among all cuts in the network. ### Intuitive Explanation: Imagine the network as a system of water pipes, with the edges as pipes having certain thickness (capacity) and the nodes as junctions. The source is where water is pumped into the system, and the sink is where water exits. **Max Flow:** - To find the maximum flow from the source to the sink, you want to send as much water as possible through the network. You adjust the flow through various paths, ensuring that no pipe carries more water than it can hold and that each junction balances the incoming and outgoing water. The maximum flow is the total amount of water per unit time that can pass from the source to the sink without any pipe overflowing. **Min Cut:** - To find the minimum cut, imagine you have to shut down the system by cutting certain pipes so that no water can get from the source to the sink. You want to do this by cutting the smallest total thickness of pipes (to save cost, for example). The minimum cut is thus the least capacity of pipes you need to cut to stop all flow from the source to the sink. ### Why Are They Equivalent? The equivalence of the maximum flow and the minimum cut can be visualized as follows: - **If you push water (flow) up to the limit imposed by the smallest cut, you're maximizing what the network can handle without overflowing any pipe.** - **Conversely, the minimum cut represents the weakest link in the system—the bottleneck—which dictates the maximum flow possible.** Thus, the max-flow value tells you the maximum stress (flow) the network can withstand based on its weakest link (min-cut). This theorem is powerful because it links these two seemingly different concepts (flow and cut) through the network's inherent structure and constraints, providing a dual perspective on the network’s capacity to transport material (like data, water, traffic) from a source to a sink. ### Divide n Conquer - MergeSort - Proofs of correctness are via induction ### Greedy - MST algorithms - Kruskal's algorithm - Prim's algorithm - Reverse-Kruskal's algorithm - Shortest path algorithm - Dijsktra's ### Dynamic Programming - MIT OCW longest increasing subsequence dynamic programming - This search will reveal a playlist - Problems in DP: Edit Distance, Knapsack, Subset problem, Max subarray, Longest Increasing Subsequence, Shortest Paths. I don't like my lecturer's explanations so I will watch Abdul Bari and these lectures for DP. This should hopefully help me solve questions more efficiently. - Algorithms/Problems - Weighted Interval Scheduling - Longest Increasing Subsequence - Max subarrray - Subset and Knapsack (subset is general case of knapsack) - Edit Distance - Shortest Path - Sequence Alignment - Least Squares - RNA structure - Longest Common Subsequence - Extra, part of OCW lectures - Should cover this ### Intractability -