DATA STRUCTURE SEMINAR REPORT ON
Submitted by : Raj S J Class : MCA – II Roll number : 07
SEARCHING Let DATA be a collection of data elements in memory and suppose a specific ITEM of information is given. Searching refers to the operation of finding the location LOC of ITEM in DATA or printing some message like ITEM does not appear there. The search is said to be successful if ITEM does appear in DATA and unsuccessful otherwise. There are many different searching algorithms. The algorithm that one chooses generally depends on the way information in DATA is arranged. Two well-known and most used algorithms in searching are 1. LINEAR SEARCH 2. BINARY SEARCH. The complexity of searching algorithms is measured in terms of number f(n) of comparisons required to find ITEM in data where DATA contains n elements.
LINEAR SEARCH Suppose DATA is a linear array with n elements. Given no other information about DATA, The most intuitive way to search for a given ITEM in DATA is to compare ITEM with each element of DATA one by one. That is, first we test whether DATA[1] = ITEM, and then we test whether DATA[2] = ITEM, and so on. This method, which traverses DATA sequentially to locate ITEM, is called linear search or sequential search. Let us see an example below were DATA is an array with five names and ITEM is the info whose location LOC we are searching in DATA. In this case array DATA has 5 elements. We are searching for ITEM = “ KAREN “
Here the ITEM “ KAREN “ is checked with first element of DATA i.e. “ MARY “. Since first element of DATA is not equal to ITEM, now 2nd element of DATA “ JANE” is checked and so on until either the array is completely traversed or the LOC of ITEM is found. Here on the fifth comparison the LOC of ITEM is found to be 5 in DATA. In the following form of algorithm to simplify the matter, we first assign ITEM to DATA[N+1].The position following the last element of data. Then the outcome LOC = N+1 Where LOC denotes the location where ITEM first occurs in data, signifies search is unsuccessful. The purpose of initial assignment is to avoid repeatedly testing whether or not we have reached the end of array DATA. This way, the search must eventually succeed.
LINEAR( DATA, N, ITEM, LOC ) Here DATA is a linear array with n elements, and ITEM is a given item of information. This algorithm finds the location of ITEM in DATA, or sets LOC:=0 if search is unsuccessful. 1. [Insert ITEM at the end of DATA] Set DATA[N+1]:=ITEM 2. [Initialize counter.] Set LOC: = 1. 3. [Search for ITEM] Repeat while DATA[LOC] ≠ ITEM: Set LOC: =LOC+1 [End of loop] 4. [Successful?] IF LOC=N + 1, the set LOC: = 0
5.Exit
Observe that step 1 guaranty that the loop in step 3 must terminate. Without step 1 the repeat statement in step 3 must be replaced by the following statement, which involves two comparisons, Repeat while LOC ≤ N and DATA[LOC] ≠ ITEM: On the other hand in order to use step 1, one must guarantee that there is an unused memory location at the end of the array DATA; otherwise one must use the linear search algorithm given below
LINEAR(DATA, N, K, ITEM, LOC) A linear array DATA with N elements and a specific ITEM of information is given. This algorithm finds the location LOC of ITEM in the array of DATA or sets LOC=0.
1. [Initialize] Set K: =1 and LOC: =0 2. Repeat Steps 3 and 4 while LOC = 0 and K ≤ N. 3. If ITEM = DATA[K], then set LOC:=K 4. Set K: = K + 1 [Increment counter] [End of step 2 loop] 5. [Successful?] If LOC = 0, then: Write: ITEM is not in the array DATA Else Write: LOC is the location of ITEM in DATA [End of if structure] 6.Exit
Complexity Of linear search algorithm
Worst case: Clearly the worst case occurs when ITEM is the last element in the array DATA or is not there at all. In either situation we have C(n) = n Accordingly C(n) = n is the worst case complexity of the linear search algorithm.
Average case: Here we assume that ITEM does appear in DATA, and that is equally likely to occur at any position in the array. Accordingly the number of comparisons can be any of the number 1,2,3……….n and each number occurs with the probability p = 1/n. Then C (n) = 1.(1/n) + 2. .(1/n) + 3 .(1/n) ………….+n.(1/n) = ( 1 + 2 + 3 ……………….+ n) .(1/n) = n.(n+1)/2.(1/n) = (n+1)/2
Binary Search Suppose the array DATA is sorted in increasing numerical order or alphabetically. In that case there is an efficient Searching algorithm called Binary search. The Binary search algorithm applied to our data works as follows. During each stage of our algorithm, our search for ITEM is reduced to a segment of elements of DATA. DATA[BEG],DATA[BEG+1],DATA[BEG+2]….DATA[END] The variables BEG and END denote respectively the beginning and end locations of the segment under consideration. The algorithm compares ITEM with the middle element DATA [MID] of the segment, where MID is obtained by MID=INT ((BEG+END)/2).. ….[INT stands for integer value] If DATA[MID] = ITEM then the search is successful and we set LOC:=MID. Otherwise a new segment of DATA is obtained as follows a) If ITEM < DATA[MID] , then ITEM can appear only in the left half of the segment: DATA[BEG],DATA[BEG+1]……………..DATA[MID-1] So we reset END : = MID – 1 and begin searching again. b) If ITEM > DATA[MID] , then ITEM can appear only in the right half of the segment: DATA[MID+1],DATA[MID+2] ……………..DATA[END] So we reset BEG : = MID + 1 and begin searching again. Initially we begin with the entire array DATA ie we begin with BEG = 1 and END = n, or more. Generally BEG = UB and END = LB
If ITEM is not in DATA then we get END < BEG This condition signals that the search is unsuccessful. Example Let DATA be the following sorted 13-element array: DATA: 11 22 30 33 40 44 55 60 66 77 80 88 99 We apply the binary search to DATA for different values of ITEM. (a)
Suppose ITEM = 40. The search for ITEM in the array DATA is pictured where the values of the DATA[BEG] and DATA[END] in each stage of the algorithm is indicated by circles and the value of DATA[MID] by a square. (1)
Initially BEG = 1 and END = 13 Hence MID = INT(1+13)/2 = 7 so DATA[MID] = 55
(2)
Since 40 < 55 , END has its value changed END = MID-1=6 HENCE MID = INT((1+6)/2) = 3 so DATA{MID] = 30
Since 40 < 30 BEG has its value changed by BEG = MID+1 = 4 Hence MID = INT[(4+6)/2]=5 and so DATA[MID]=40
We found ITEM in location LOC = MID =5
(b)
Suppose ITEM = 85.
(1) Again initially BEG=1,END=13 ,MID=7 and DAT [MID] = 55
(2) Since 85>55 BEG has its value changed by BEG = MID + 1= 8. Hence MID = INT[(11+13)/2]=10 and so DATA[MID] = 77
(3) Since 85 > 77 , BEG has its value changed by BEG = MID+1 = 11 Hence MID = INT[(11+13)/2]=12 and so DATA[MID] =88
(4) Since 85 < 88, END has its value changed by END = MID – 1=11. Hence MID = INT[(11+11)/2]=11 and so DATA[MID]=80
Observe that here BEG = MID = END = 11 Since 85 >80, BEG has its value changed BEG = MID + 1 = 12 . But now BEG > END. Hence ITEM does not belong to DATA.
BINARY( DATA, LB, UB, ITEM, LOC) Here DATA is a sorted array with lower bound Lb and upper bound UB and ITEM is a given item of information. The variables BEG, END and MID denote respectively the beginning, end and middle locations of a segment of elements of DATA. This algorithm finds the location LOC of ITEM in DATA or sets LOC : = NULL 1. [Initialize segment variable] Set BEG : = LB , END : = UB and MID = INT((BEG + END)/2) 3. Repeat steps 3 and steps 4 while BEG ≤ END and DATA[MID] ≠ ITEM 3. If ITEM < DATA[MID] then Set END : = MID – 1 Else Set BEG : = MID + 1 [End of If structure ] 4. Set MID:=INT((BEG+ END)/2). [End of step 2 loop ] 5. If DATA[MID] = ITEM, Then : Set LOC: = MID Else Set LOC: =NULL [End of If structure] 6.Exit
Complexity of Binary search In Binary search each comparison reduces the sample size in half. Hence we require at most f(n) comparisons to locate ITEM where F(n) = [ log2n ] + 1 ie the running time of worst case is approximately equal to log2n.Running time for average case is approximately equal to the running time for worst case.
COMPARISON BETWEEN LINEAR AND BINARY SEARCH LINEAR SEARCH
BINARY SEARCH
Very easy to Implement
Comparatively complex to implement
Time taken to find item increases when n is large I.e. algorithm efficiency decreases as number of elements n becomes large
Comparatively much faster. It requires only 20 comparisons with an initial list of 1000000, elements
Complexity is n Can work on any array or list
Complexity is Log 2 n Need the array to be sorted This is the major disadvantage as it is costly to keep array of data sorted if there is frequent insertion or deletion
Ideally, the choice between Linear and Binary Search should be taken based on the number of search operations that are likely to be performed over the List. As the number increases Binary search becomes more efficient than linear search.