Preface Of Algorithms

  • Uploaded by: AMIT RADHA KRISHNA NIGAM
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Preface Of Algorithms as PDF for free.

More details

  • Words: 3,915
  • Pages: 8
Preface of algorithms LET’S HAVE A TALK... Like all Software engineering processes, the study of analysis and design of algorithms is driven by the need of efficiently solving many practical computational problems and applications encountered in almost all intra and inter disciplinary aspects of computer science and engineering. The word ‘efficiently’ above deserves a few more lines here. In a first note, algorithm efficiency means how much space (both auxiliary and immediate) and computing time the program (algorithm) would need from beginning of its execution till its completion. The point to talk about is that both computing time and space in computer memory is a bound resource. Our digital computers, however fast are not infinitely fast, and the storage, however large and cheap, is not indefinite and free. Besides these performance constraints, Software engineering process has its own limitations of design, documentation, data abstraction, modularity, error handling etc. An efficient algorithm thus would be one that takes care of all these issues. Any wrongly conceived algorithm, on the other hand, can take much space or may terminate in a time that far exceeds its expectations. Deciding on what algorithm would be best for a particular problem is the central theme of the design and analysis of algorithms. Problem instances are not rare when different algorithms exist for a single problem. In such cases, we can’t stick to a particular solution. Instead deeper analysis of efficiency of all the solutions as well as much deeper analysis of the problem itself is required. For example both insertion sort and merger sort solves the same sorting problem over a sequence of n numbers. Then which solution to opt for. Let’s see an instant solution Insertion sort uses an incremental approach while merge uses divide and conquer to sort the given array. Both of them are different approaches. Insertion sort takes time = c1*n^2 i.e . n^2 while merge sort upper bound in worst case time is =c2*nlgn.(or O(nlgn) ) where c1 and c2 are constant that does not depend on n. Insertion sort usually has a smaller constant factor than merge sort so c1 significantly smaller inputs, insertion sort is faster. but once the size of the input grows, merge sort obviously shows its advantage of smaller ‘lgn’ as against the much larger factor ‘n’ of insertion sort in its running time. So at larger inputs merge dominates. This Analysis of both of them (problem as well as algorithm) reveals that it would be clever to use insertion sort within merger sort so that when sub problems grows small, faster insertion sort is used. Even if we have a particular efficient solution (on space and time scale) for a problem, choosing it straightaway can do more harm than good. Software Engineering experiences in building large systems have shown that difficulty in implementation, quality and performance of the final result depends heavily on carefully choosing the best data structures also. After the data structures are chosen, the algorithm to be used often becomes obvious. However, sometimes it is the algorithm that governs the choice of the data structure as some algorithm works best with some data structures only. This point will be clearer when studying data structures for disjoint sets. APPLICATIONS What makes the subject the most interesting discipline of the entire computer engineering

(atleast according to me) is the fact that there is no line drawn in any engineering discipline where you can say that “from this point, algorithms are of no use”. Algorithms are encountered almost everywhere, where there is any phenomenon related to computers and programming. In computer Operating systems, as early as 1960s when the concept of multitasking operating system developed, problems of process synchronization and scheduling surrounded. Then Solutions to classical problems of synchronization including producerconsumer problem, critical section problem, readers writers problem, dining philosophers etc have been carefully studied and efficient algorithms were devised to solve them. CPU and Disk scheduling are other two main aspects of Time sharing systems which are achieved using efficient scheduling algorithms like SJFS, round robin etc. Strategy to choose algorithms become more complex when we dealt with multiple processor systems.We used parameters like throughput to compare algorithms and analytically evaluate them using several deterministic models. Some applications of graphs, such as Bankers algorithm, are also used while handling deadlocks. Talking about graphs, is to say that, many graph problems such as covering and partitioning problems, graph coloring problems, shortest paths, vertex ordering and problems of isomorphism also seek algorithms for their solution while graphs themselves have their own interesting history of providing good solutions to some of the most long standing problems of CS. Teleprinter’s problem that was a long standing problem of communication theory was solved using an application of Euler digraph in 1940. Another called Konnisbridge problem has a similar older history. In Computer networks, need less to say, the role of algorithms is indispensable. They play a major role in providing solutions to many problems in computer networks theory. Issues of channel allocation, congestion control, routing (using shortest paths algorithms like Dijkstra, Bellmanford), rendering maps and interpolating addresses are vital to every network design. In the theory of compiler construction, error detection & handling, code optimization , symbol table mapping, parsing and other design issues are answered by clever algorithms only. Storing and retrieval of the astronomical volumes of data that is stored on the internet in different data bases and tools for data analysis require sophisticated algorithms. Public key cryptography and techniques of digital signature in e-commerce are among the chief technologies that makes use of numerical algorithms and number theory. The hardware design, in electronics, manufacturing and IT industry use algorithms. The design of GUI relies on algorithms. Computational biology makes use of algorithms and so-on. Theoretically we see algorithm as a tool for solving well-defined computational problems but place of algorithm in modern computing is not less important than advanced technologies like fast hardwares with high clock rates, pipelining and super scale architecture, interactive GUIs, OOS, advanced LAN and WAN networking technologies etc. Although there are applications that do seem to require algorithms at the application level but they do so extensively on the bed. All these applications are written in one or the other language that finally translate into machine language. They are then processed by compiler, interpreter or assemblers all of which use algorithms at the core of their functioning. There are, however problems, in theoretical computer science for which no efficient algorithms are known. An important subset of these are known as NP-complete problems. We try to break such problems into either subproblems for which there is an

efficient algorithms or try to find optimization solutions for smaller class of inputs. We shall take this topic later. PREREQUISITES As already said, important Prerequisites to study and design algorithms are a strong mathematical background in various concepts among others. A good list (although may not be exhaustive) is prepared for you below.

Concept from discreet mathematics : sets and their laws,relations and groups. Functions : monotonicity, floors and ceilings, exponential, log,iterated log,factorial,modular arithmetic functions, interpolation and approximations. Polynomials and Polynomial equations : Polynomials : asymptotic behavior of polynomials, DFT and use of FFT to reduce time of polynomial multiplication, convolution theorem, FFT using modular arithmetic Polynomials equations : • Single variable 1. Linear equations -algebric and trancendental Direct methods to solve - quadratic, cubic (Carden’s and Horner’s method) and biquadratic equations. Indirect methods or approximation methods or iterative numerical methods (genarally for higher degree transcendental equations ) - Bisection, Iteration, Regula-falsi, Secant,Newton Raphson, Lin-Bairstow’s, Muller and Graeffe’s root squaring method. • Multi variables (>=2 variables ) or system of simultaneous algebric equations Here we have more than one equations or a system of simultaneous algebric equations. we usually represent such a system using Matrices. In connection with Matrices, there arise two numerical problems besides finding the solution of equations by methods as briefed up below. The first is finding the inverse of a Matrix using gauss elimination, Gauss Jordan, Crout’s method partition method and the other is to finding the eigen Values and corresponding eigen vectors of a Matrix where we come across methods like Caley Hamilton Theorem, Power method, Lacobs method, Given’s method and House Holder’s method. Computation for the eigen values and simultaneous linear equations are required in many engineering and scientific problems. 1. Simultaneous non-linear equations. Indirect methods-method of iteration, complex roots by Newton Raphson 2. Simultaneous linear equations Direct methods-Cramers rule,matrix inversion methods, Gauss elimination, Gauss Jordan, Crout’s (LU) method iterative methods-Jacob’s method, gauss-Seidal, Relaxation and illconditioned system of equations. Series and Summations: log, exponential and binomial series. Elementary number theoretic notations: Basic concepts : divisibility, modular equivalence, unique factorization ,GCD and euclids algorithm , Lame’s theorem, concepts of modular arithmetic (finite groups eg abelian group, lagrange’s theorem, chinese remainder theorem), integer factorization.

Combinatorics-permutation and combination, binomial coefficients, stirlings formula Probability : axioms on probability distributions (discrete and continuous uniform probability distributions), Bayes theorem, Boole’s inequality, discrete random variables: discrete random variables, probability density functions, property of linearity of expectation, variance and standard division of random variables, geometric and binomial distributions. Numerical analysis is the study of algorithms for the problems of continuous mathematics (as distinguished from discrete mathematics). Numerical analysis naturally finds applications in all fields of engineering and the physical sciences, but in the 21st century, the life sciences and even the arts have adopted elements of scientific computations. Ordinary differential equations appear in the movement of heavenly bodies (planets, stars and galaxies); optimization occurs in portfolio management; numerical linear algebra is essential to quantitative psychology; stochastic differential equations and Markov chains are essential in simulating living cells for medicine and biology. ANALYSIS OF ALGORITHMS We chiefly focus on studying and analyzing asymptotic efficiency of algorithms ie how the running time of an algorithm increases with the size of the input in the limit, as the size of the input increases without bounds. We chose algorithms for large inputs that are asymptotically more efficient. To study this, several asymptotic notations have been developed to make the analysis easy. Aho, Hopcraft and Ullaman also advocated the use of this analysis for comparing relative performances of algorithms. DESIGN STRATEGIES Dynamic programming is suited for problems that show optimal sub structure so that re computation of a sub-sub problems is avoided as unlike another technique Divide and conquer where every sub problems are independent so that every time they need to be computed recursively. The equation of the running time of algorithms here are obtained in the form of recurrences (function values in terms of its values at smaller inputs) that are solved using various recurrences techniques Interesting to see that recurrences were studied as early as 1202 by L. Fibonacci for whom the Fibonacci numbers are named. On the other hand, in dynamic programming, the re computation is avoided by maintaining a table of results, better known as memoization. It’s great advantage is that it reduces exponential complexity of many problems to polynomial complexity but unfortunately its not a solution to all complex problems. Assembly line scheduling, matrix chain multiplication, longest common sub sequence are some example better understood in context of dynamic programming. An important and simpler variant of Divide and conquer is Decrease and conquer used by binary search algorithm. Another approach, similar to dynamic is Greedy where we do not look for an exact solution for solving sub problems steps instead a greedy choice is made of what looks best for the moment. Greedy is not always the best solution but when is works correctly, it is the fastest. Kruskal algorithm for finding MST is perhaps the best known example of it other we would study include activity selection, task scheduling, Huffman codes etc. Greedy and dynamic are infact two independent methods to solve a higher class of problems called optimization problems. These problems go through a series of steps with a set of choices at each step. Optimization finds its application in the design of Huffman

codes for data compression, optimal solution of matroids, MST, shortest path from single source, TSP, knapsack 0-1, Chvatal’s set covering heuristics to name a few. Avid readers would question that how it is possible to know whether a problem can be best solved by a greedy approach or dynamic. It is of course not known, so we first use dynamic approach to find a solution and then show that we can always make greedy choices using greedy algorithm to arrive at an optimal solution. We also study other design methods such as Amortized Analysis where the cost of all operations is averaged using methods such as accounting and potential. Linear programming is for problems like maximum flow and Integer programming which is complex variant of linear programming is where the solution space is restricted to all integers. Linear programming and Randomization are useful techniques besides greedy in the design of approximation algorithms eg 3CNF problem uses randomization to get an optimized version of the problem and vertex cover problem uses linear programming for the same.TSP, set covering, subset-sum are other problems that exploit either approximation or greedy. Some of them NPcomplete problems, as we shall see, that have only possible route i.e to find a near optimal solution using approximation algorithm. Yet some problems use Reduction or Transform and conquer where the goal is to transform a problem into another simpler such that the complexity before does not dominate the complexity after. Search and enumeration, as the name suggests is related to counting, Backtracking, where reverse paths to the choice of options are often followed when solutions using particular option is not what we are looking for Hill climbing where the basic idea is to start with a poor solution to a problem and then repeatedly apply optimizations to that solution until it become optimal or meets some other requirements. It is used in network flow problems. Probabilistic analysis is another powerful way of analyzing algorithms that usually incorporate random behavior on the distribution of inputs. We call these algorithms as randomized algorithms such as Hiring problem and Birthday paradox. Use of probability in analyzing algorithms, for eg the average case analysis of quicksort, bucketsort, and the order statistic algorithm use probability, however use of probability puts several restrictions on the input and that’s not easy. Randomization and probabilistic analysis have become fundamental tools in modern Computer Science, with applications ranging from combinatorial optimization to machine learning to cryptography to complexity theory and to the design of protocols for communication networks. Often randomized algorithms are more efficient, and conceptually simpler and more elegant than their deterministic counterparts. Probabilistic and heuristic approaches is by far my most favorite design strategy that will be taken in some what more details later. Yet other simpler strategies include Recursion and iteration, often taken as same are not exactly. However, every recursive version has an equivalent iterative version and vice versa. The knowledge of what implementation, whether recursion or iteration, is to be used depends entirely on the problem eg Tower of Hanoi is analyzed using recursion only. Besides straight forward algorithm design strategies like divide and conquer and dynamic, we also use data structures, often, as a tool to devise efficient algorithms. Amongst many, Heapsort algorithm is a perfect example of it that uses the property of heap. This is because most often, the algorithm we design tend to require operations that several data structure support. We then tend to use such data structures directly in the algorithm and implement the operations of that data structure as a ‘sub-step’ in the algorithm.

For example when an algorithm uses a data structure say binary heap, it means that it inherently uses one, two or all of the operations (such as insertion, deletion or search) of the underlying data structure in its steps. The total time then comes out to be the time taken by algorithm + time taken by such operations. This brings out the indispensability of asymptotically analyzing all the operations of the data structures besides analyzing the algorithm themselves. Consider another example. There are several operations that we perform on dynamic sets eg insertion, deletion, test membership of an element, successor, predecessor etc. The issue of which data structure we may use to represent these sets depend on what ‘set of operations’ we would use. We may use data structures such as stacks, queues, Linked list and rooted trees eg BST and its variants to represent dynamic sets but when we need only to implement dictionary operations (insert, delete and search), we use hash tables to represent dynamic sets as hashing calculates to much better asymptotic efficiency. In many other cases it has been proved that significant improvements in efficiency can be obtained by selecting (or designing) appropriate representations of the data structures employed and the algorithms for their manipulation, without requiring any changes to the underlying algorithms. Another important use of data structures is in the implementation of disjoint sets as some applications involve grouping n distinct elements into a collection of disjoint sets. We implement these disjoint sets as data structures. One of the many applications of disjoint set data structure arises in determining the components of an undirected graph. CLASSIFICATION From one point of view, algorithms can be either deterministic or non-deterministic depending on whether they make use of clever choices or heuristics or not. The latter definitely use heuristics and other techniques for making typical and accurate guesses. But actually, computer scientists classify problems (and not algorithms) into equivalent classes based on the complexity of the best possible algorithm for them. Computational complexity theory, as a branch of the theory of computation in computer science, investigates the problems related to the amounts of resources required for the execution of algorithms (e.g., execution time), and the inherent difficulty in providing efficient algorithms for specific computational problems. Much of complexity theory deals with (1) Decision problems. A decision problem is a problem where the answer is always yes or no. Complexity theory distinguishes between problems verifying yes and problems verifying no answers. A problem that reverses the yes and no answers of another problem is called a complement of that problem. The theory of computational complexity establishes decision problems into three main complexity classes P (polynomial), NP (non-deterministic polynomial) and NPC (NP-complete). A complexity class is the set of all of the computational problems which can be solved using a certain amount of a certain computational resource. Class P or Tractable or easy problems are set of those problems that are solvable in polynomial time by a deterministic machine ie on inputs of size n, their worst case running time is O(n^k) for some constant k The complexity class NP is the set of decision problems that can be solved by a nondeterministic machine in polynomial time. Class includes the Boolean satisfiability problem, the Hamiltonian path problem and the vertex cover problem etc.

Class NPC, is a subset of NP are the most interesting set of decision problems because neither any polynomial time algorithm exists for them nor we have been able to prove that whether such an algorithm exists or not. But these problems have two interesting properties. One, If any single problem in NP-complete can be solved in polynomial time, then every problem in NP can also be solved in polynomial time. If ever we find a polynomial solution to any NPC problem, than due to this property one, every problem in NP can be quickly solved. So we would say that It is not known whether every problem in NP can be quickly solved—this is called the P = NP problem. Nobody has yet been able to determine conclusively whether NP-complete problems are in fact solvable in polynomial time, making this one of the great unsolved problems of mathematics. The Clay Mathematics Institute is offering a $1 million reward to anyone who has a formal proof that P=NP or that P.NP. Second property is, their solutions can be verified in polynomial time. But from where does this solution comes which we are verifying. To verify the solution, we somehow give a ‘certificate’ of the solution and then prove that the certificate is correct in polynomial time in the input size. NP-complete problems are studied because the ability to quickly (in polynomial time) verify ‘solutions’ to a problem using certificates seems to correlate with the ability to quickly solve that problem itself. There is another class of problems called (2) Intractable (Hopcroft, et al, 2007: 368) or hard problems are those that require super polynomial time. To see why exponential-time solutions might be unusable in practice, consider a problem that requires 2^n operations to solve (n is the size of the input). For a relatively small input size of n=100, and assuming a computer that can perform 10^12 operations per second, a solution would take about 4×10^10 years, much longer than the current age of the universe. Complexity theory assumes problems that lack polynomial-time solutions are intractable for more than the smallest inputs. Problems that are known to be intractable in this sense include those that are exponential TIME-complete. If NP is not the same as P, then the NPcomplete problems are also intractable in this sense. What this means “in practice” is open to debate. Then there are (3) problems that can’t be solved by any computer, no matter how much time is provided such as Turing’s famous Halting problem. Decision problems seems to stand opposite of optimization problems in which we always try to find the best optimal solution so that we don’t say ‘YES’ or ‘NO’ quickly to any particular solution. in this sense, NP-completeness applies directly to decision problems but no to optimization problems. I would love to cover this entire manual from theory of NP-completeness but that’s not feasible so more details and lemmas later. IMPLEMENTATION OF ALGORITHMS Algorithms are essentially implemented in one or the other language (HLL eg Perl, JAVa ,C++ or LLL) and are designed to function in either serial, parallel or distributed environments. Algorithms have to incur additional costs of communication overhead if they are used for distributed environments while algorithms that are designed for serial environments execute only one instruction at a time. Parallel models have benefit in a sense that several processor works on the same problem at the same time. parallel and distributed techniques divide a problem into more symmetric/asymmetric sub problems and get back the results There are 3 major types of machines, depending on their representation of data in the memory: · Pointed Machines –

these machines represent data as pointers only, no arrays are defined in these machines. · RAM model – these machines represent data as arrays. In all the lectures, it is assumed that we are working with these machines.· PRAM model – this model is used in parallel computing, and is similar to the RAM model. the algorithms given here are all based on RAM model of computation which is given as a separate study. After all this talking, it is needless to say now that Algorithms are an essential part of EVERY computer system (the capitalization does matters) and so as the need to master its concepts for EVERY CS student (the capitalization does matters again),, Its important because if we ever think of any innovation, we should not be lacked by fundamentals that really begin them. Hope you’ve enjoy talking with me and enjoy developing your concepts too. Thanks ! Amit Nigam, Author [email protected]

Related Documents

Preface Of Algorithms
April 2020 14
Preface
July 2020 13
Preface
June 2020 17
Preface
July 2020 14
Preface
November 2019 15
Preface
July 2020 6

More Documents from ""