FUNCTIONAL TESTING OF A MICROPROCESSOR THROUGH LINEAR CHECKING METHOD Prof. Dr. Pervez Akhtar National University of Science & Technology, Karachi Campus, Pakistan
[email protected] Prof. Dr. M.Altaf Mukati Hamdard Institute of Information Technology, Hamdard University, Karachi, Pakistan
[email protected]
ABSTRACT The gate-level testing also called low-level testing is generally appropriate at the design time and for small circuits. The chip-level testing and board-level testing also called high-level testing are preferred when the circuit complexities are too high, making it difficult to perform low level testing in a reasonable amount of time. The cost of low-level testing is also generally very high. Such high costs and time are only justified when some design-changes are required. In this paper, a high level quick checking method, known as Linear Checking Method, is presented which can be used to qualify the functionality of a Microprocessor. This can also be used to check hard faults in Memory chips. Keywords: Microprocessors, ALU, Control Unit, Instructions.
1
INTRODUCTION
Due to the advances in the integrated circuit technology, more and more components are being fabricated into a tiny chip. Since the number of pins on each chip is limited by the physical size of the chip, the problem of testing becomes more difficult than ever. This problem is aggravated by the fact that, in nearly all cases, integrated circuit manufacturers do not release the detailed circuit diagram of the chip to the users [1]. The users are generally more interested to know about the chip, whether is it functionally working and relied upon? if not, the whole chip is replaced with a newer one. This is contrast to the gate-level testing of a digital circuit, which is used to diagnose faulty gates in the given circuit, in case of failing. The idea of using functional testing is also augmented by the fact that in case of any functional failure, caused due to any fault in the chip, the user can not repair the chip. Hence the users have only two choices: either to continue using the chip with a particular failing function, knowing that the failing function will not be used in the given application or to replace the whole chip. The functional modeling is done at a higher level of abstraction than a gate-level modeling. This infact exists between the Gate-level modeling and the Behavioral modeling, which is the highest level of
UbiCC Journal - Volume 3
abstraction [2]. The functional fault modeling should imitate the physical defects that cause change in the function or behavior, for example; the function of a synchronous binary up-counter is to advance one stage higher in binary value when clock hits it. A physical defect, which alters this function, can be modeled in terms of its effect on the function. Such defect-findings are extremely important at the design time or if the design changes are required at a later stage. What, if a microprocessor does not produce the correct results of any single or more functions? From the user’s perspective, it is enough to know which function is failing, but from designer’s perspective, the cause of failing is also important to know, so that the design changes may be carried out, if necessary. This is certainly a time-taking process. For example, the gate level simulation of the Intel 8085 microprocessor took 400 hours of CPU-time and only provided 70% fault coverage [3]. High level functional verification for the complex Systems-On-Chip (SOCs) and microprocessors has become a key challenge. Functional verification and Automatic Test Pattern Generator (ATPG) is one synergetic area that has evolved significantly in recent years due to the blossoming of a wide array of test and verification techniques. This area will continue to be a key focus of future Microprocessor TEST and Verification (MTV) [4].
1
Functional failures can be caused due to single or multiple stuck-at faults in any of its functional block. The functional-testing, which refers to the selection of tests that verify the functional operation of a device, is one of an efficient method to deal with the faults existing in a processor. Functional testing can also be carried out at a smaller level, for example, a functional test of a flip-flop might be to verify whether can it be set or reset? and further, can it hold the state or not? Similarly, the other MSI chips such as Multiplexers, Encoders, Decoders, Counters, Hardwired-Multipliers, Binary-Adders & Subtractors, Comparators, Parity Checkers, Registers and other similar circuits can also be verified for their required functionalities. Some designers and manufacturers provide builtin self-test (BIST) these days that generate the test on the chip and responses are checked within the chip itself. However, the widespread use of such testability techniques is hampered by a lack of tools to support the designer and by the additional cost in chip area as well as the degradation in performance [5]. For example, the Intel 80386 microprocessor employs about 1.8% area overhead for BIST to test portions of the circuit [6]. The ever increasing complexity combined with the advanced technology used in the design of the modern microprocessors, has lead to two major problems in producing cost-effective, high quality chips: 1. Verification: This is related to validate the correctness of the complex design. Simulation is the primary means of design validation used today. In the case of processor design validation, the sequences are either written manually or generated automatically by a random sequence generator [7]. 2. Testing: This is related to check the manufactured chips for realistic defects. A variety of test generation and design-fortestability (DFT) techniques is used to ensure that the manufactured chips are defect-free. Both design verification and testing depend, therefore, on test sequences used to expose either the design faults or manufacturing defects. It has also been found that manufacturing test pattern generation can be used for design verification [8] and that design verification techniques can be used to find better manufacturing tests [9]. However, to find the effective test patterns for either of the said purposes is not simple, due to high complexities of microprocessors. Hence the only effective method left is to develop the functional tests. Considerable work has been done in the field of microprocessor functional testing. One of such work, known as ‘Linear Checking Method’ is presented in this paper. Before performing functional testing, functional
UbiCC Journal - Volume 3
description of the chip must be known. In case of microprocessor, this can be obtained through its instruction set. The two most important functional blocks of any microprocessor are the CU (Control Unit) and the ALU (Arithmetic Logic Unit). All the instructions, at low-level, are composed of Op-Codes and operands. An op-code, also called the Macroinstruction, goes to the CU, which decodes each macro-instruction into a unique set of microinstructions. The operands go to the ALU, which processes it according the tasks defined within the micro-instructions. In between these functional blocks, there exists several registers for the temporary storage of op-codes, decoded-instructions and operands. The fault may occur at various places in the processors, causing it to function incorrectly. Some of the common faults are: Register Decoding Fault, Micro-Operation Decoding Fault (caused may be due to internal defect to the CU), Data Storage Fault (caused may be due to Stuck-at Fault or Pattern Sensitive Fault in the memory inside the Microprocessor), Data Transfer Fault (caused may be due to Stuck-at Fault or Bridging Fault on the busses connecting the various functional blocks of a Microprocessor) or ALU Fault (caused due to internal defect to the ALU). In each case, the given microprocessor results in producing incorrect function/ functions. In the subsequent sections, first the functional verification has been described in general and then the Linear Checking Method has been presented through several examples. Based on the results obtained, the conclusion has been drawn and the further work has been proposed.
2
FUNCTIONAL VERIFICATION
The micro-instructions from the CU and the operands of an instruction are sent to the ALU simultaneously. The ALU then carries out the intended task or function. This can be shown with the help of a block diagram, as in Fig. 1.
Figure 1: Functional testing The typical instructions are ADD, SUB, MUL,
2
SHL, SHR, ROTL, ROTR, INC, DEC, COMPL, AND, OR, XOR and many others. 3
LINEAR CHECKING METHOD
This method can be used to test and verify, not only the functionality of a microprocessor (more specifically ALU), but the memories as well. Linear checking is based on computing the value of ‘K’ using the equation 3.1:
K = f i (x, y) + f i (x, y) + f i (x, y) + f i (x, y)
(1)
Equation 1 is called the ‘basic equation’. The variables x and y are the operands, ‘i’ is the instruction. The value of K does not depend on the values of x and y, but only depends on the instruction and on the size of operands (number of bits in the operands). It means the value of K is unique for every instruction. The chances are very little that the two instructions may have the same constant value of K. An 8 and 16-bit ALUs have different values of K, for the same instruction. Hence, in this method, K is used as a reference value to verify the functionality of an individual instruction.
3.1
Examples of functional verifications
Consider a 4-bit ALU. The value of ‘K’ can be computed as follows: Suppose the instruction is ADD(x, y) = x + y
K+(n) = 4(2n – 1) Where, the subscript with K represents the function. Hence from the generalized form, we obtain the same value of K i.e. if n = 4, then K+(4) = 4(15) = 60. Similarly, the value of K for any instruction can be obtained, provided its functional description is known. The value of K, for the various other frequently used instructions, can be obtained similarly, as follows: Again assume a 4-bit ALU. Taking x = 10 and y = 12 then x = 5 and y = 3 in all the computations. 3.1.1 Multiply instruction (f(x,y) = X * Y) fi(x,y) = MPY(x,y) = X * Y Hence, from equation 3.1, the value of K can be obtained as follows: MPY(12,10)+MPY(10,3)+MPY(5,12) +MPY(5,3) 120 + 30 + 60 + 15 = 225 Generalized form Æ K* = (2n – 1)2 3.1.2 Transfer instruction (f(x) = x) This is a single valued function, hence only one variable is taken in computation of K, thus ‘y’ is ignored in equation 1.
Here, n = 4. Let x = 5 (0101) and y = 3 (0011)
Thus f i (x) = x = x + x + x + x = 10 + 10 + 5 + 5 = 30
Therefore x = 1010 and y = 1100
Generalized form Æ K = 2(2n – 1)
The value of K can be obtained from Equation 1, as follows: ADD(5,3)+ADD(5,12)+ADD(10,3)+ ADD(10,12) = K 8 + 17 + 13 + 22 = 60 Hence, for a 4-bit ALU, the ADD instruction will always be tested with respect to its reference value of 60, regardless what values of x and y are taken, i.e. instead of 5 and 3 as in the above example, now these values are taken as 9 and 10 respectively. Still the value of K remains the same, as proved below: i.e. for x = 9 (1001) and y = 10 (1010)
3.1.3 Shift-Right instruction (f(x) = SHR(x)) It is also a single valued function. With x = 1010: (a) f i (x, y) & f i (x, y) reduce to f i (x) and (b) f i (x, y) & f i (x, y) reduce to f i (x) Now f i (x) represents the value of x, after SHR operation i.e. 1010 Æ 0101 and f i (x) represents the value of x after SHR operation i.e. 0101 Æ 0010. Hence, K = 0101 + 0101 + 0010 + 0010 or K = 5 + 5 + 2 + 2 = 14 Generalized form Æ K = 2(2n-1 – 1)
x = 6 (0110) and y = 5 (0101) ADD(9,10)+ADD(9,5)+ADD(10,6)+ADD(6,5)= 60 The generalized formula can also be developed to find the value of K for the ADD instruction, for any size of ALU, as follows:
UbiCC Journal - Volume 3
3.1.4 Shift-Left instruction (f(x) = SHL(x)) With the same explanation as in section 3.1.3, the equation 1 becomes:
K = f i (x) + f i (x) + f i (x) + f i (x)
3
Hence, with x = 1010 Æ f i (x) = 0100 &
3.1.10
x = 0101 Æ f i (x) = 1010
= 0101 + 0101 + 1010 + 1010 = 30
Therefore, K = 4 + 4 + 10 + 10 = 28 Generalized form Æ K = 2(2n – 2) 3.1.5
Logical-OR instruction (f(x) = x·OR·y)
Complement instruction (f(x) Æ x )
K = f i (x) + f i (x) + f i (x) + f i (x) Generalized form Æ K = 2(2n - 1) 3.1.11
2’s comp. instruction (f(x) Æ x + 1)
K = f i (x, y) + f i (x, y) + f i (x, y) + f i (x, y)
K = f i (x) + f i (x) + f i (x) + f i (x)
= (x·OR·y)+(x·OR· y )+( x ·OR·y)+( x ·OR· y )
= 0110 + 0110 + 1011 + 1011 = 34
= 1110 + 1011 + 1101 + 0111
3.2
=14 + 11 + 13 + 7 = 45 Generalized form Æ K = 3(2 – 1) n
3.1.6
Generalized form Æ K = 2(2n + 1)
Logical-AND instruction (f(x) = x·AND·y)
K = f i (x, y) + f i (x, y) + f i (x, y) + f i (x, y) = (x·AND·y)+(x·AND· y )+( x ·AND·y)+
Memory error correction
Linear checks can also be used to verify memories. For example, let the multiplication x*y function is stored in the memory. Let the operands are 4-bit length, with x = 1010 and y = 0010, it means x = 0101 and y = 1101. Hence, all the four components of equation 1 are computed and the results are stored in memory, as shown in Table 1.
( x ·AND· y ) = 1000 + 0010 + 0100 + 0001 = 8 + 2 + 4 + 1 = 15 Generalized form Æ K = 2n – 1 3.1.7
Logical-XOR instruction (f(x) = x·XOR·y)
K = f i (x, y) + f i (x, y) + f i (x, y) + f i (x, y)
Table 1: linear checks on memories
f(x,y)
20
00010100
f(x, y )
130
10000010
f( x ,y)
10
00001010
f( x , y )
65
01000001
K
225
11100001
=(x·XOR·y)+(x·XOR· y )+( x ·XOR·y)+
K = f i (x) + f i (x) + f i (x) + f i (x)
If the sum of four components is not equal to the value of K, then there must be some fault existing in the memory. Similarly, any of the preceding functions can be used to verify the memories. The testing can be done more accurately if the contents of f(x,y) are stored at address (x,y). In the above example, the contents can be stored on the corresponding addresses as shown in Table 2. If addition of the contents does not come equal to the value of K, then it will indicate some fault in the memory. Here, the location of fault is also obtained.
= 1011 + 1011 + 0110 + 0110
Table 2: address versus contents in memory testing
( x ·XOR· y ) = 0110 + 1001 + 1001 + 0110 = 6 + 9 + 9 + 6 = 30 Generalized form Æ K = 2(2n – 1) 3.1.8 Increment instruction (f(x) = INC(x)) This is also a single valued function:
= 11 + 11 + 6 + 6 = 34 Generalized form Æ K = 2(2n + 1)
Address x
y
Contents
1010
0010
00010100
K = f i (x) + f i (x) + f i (x) + f i (x)
1010
1101
10000010
= 1001 + 1001 + 0100 + 0100
0101
0010
00001010
= 9 + 9 + 4 + 4 = 26
0101
1101
11100001
3.1.9
Decrement instruction (f(x) = DEC(x))
Generalized form Æ K = 2(2 - 3) n
UbiCC Journal - Volume 3
4
4
RESULTS
All the computations done in the previous section are summarized in tables 3 & 4 for n = 4 & 8 respectively: Table 3: Values tabulated through linear checking method for n = 4
Add
x+y
4(2n – 1)
1020
Multiply
x ∗ y
(2n – 1)2
65025
Subtract
x–y
0
0
Logical OR
x∨y
3(2n – 1)
765
x∧y
2n – 1
255
x⊕y
2(2n – 1)
510
x
2(2n - 1)
510
f i (x, y)
K i (n)
K i (4)
Clear
0
0
0
Transfer
x
2(2n – 1)
30
Complement
Add
x+y
4(2n – 1)
60
2’s comp.
x +1
2(2n + 1)
514
Multiply
x ∗ y
(2n – 1)2
225
Increment
x+1
2(2n + 1)
514
Subtract
x–y
0
0
Decrement
x–1
2(2n - 3)
506
Logical OR
x∨y
3(2n – 1)
45
Shift-Left
(x2,x3,...xn,0)
2(2n – 2)
508
Instruction i
Logical AND Logical XOR Complement
2’s comp.
Logical AND Logical XOR
x∧y
2 –1
15
Shift-Right
(0,x2,x3,...xn-1)
2(2n-1 – 1)
254
x⊕y
2(2n – 1)
30
Rotate-Left
(x2,x3,...xn,x1)
2(2n - 1)
510
x
2(2n - 1)
30
Rotate-Right
(xn,x2,...xn-1)
2(2n - 1)
510
x +1
2(2n + 1)
34
n
5
Increment
x+1
n
2(2 + 1) n
34
Decrement
x–1
2(2 - 3)
26
Shift-Left
(x2,x3,...xn,0)
2(2n – 2)
28
Shift-Right
(0,x2,x3,...xn-1)
2(2n-1 – 1)
14
Rotate-Left
(x2,x3,...xn,x1)
2(2n - 1)
30
Rotate-Right
(xn,x2,...xn-1)
2(2n - 1)
30
Table 4: Values tabulated through linear checking method for n = 8
f i (x, y)
K i (n)
K i (8)
Clear
0
0
0
Transfer
x
2(2n – 1)
510
Instruction i
UbiCC Journal - Volume 3
CONCLUSION
It is concluded that the value of K can be obtained for any given instruction. The ‘CLR’ (clear) instruction is a special one, since it does not have any operand; all the four components of the equation 3.1 are taken as 0. Note that almost all the values obtained for K are unique, except for ‘Transfer’, ‘Complement’, ‘Logical XOR’ and ‘RotateLeft/Right’ instructions, which means if the instructions are transformed due to any fault in the CU (or in any associated circuit) then these particular functional failures cannot be distinguished but the processor as a whole can be declared to have contained a fault. The last column in tables 3 & 4 can be obtained directly from the generalized forms. This column is stored in the memory along with the relevant function. 6
FUTURE WORK
Further research is proposed on the given method, especially in the case when the ‘Reference Value (K)’ of two or more functions is obtained same i.e. to distinguish or identify an individual failing function, in case when their reference values happen to be the same, as mentioned under the conclusion.
5
7
REFERENCES
[1] Su Stephen Y.H., Lin Tony S., Shen Li:
Functional Testing of LSI/VLSI Digital Systems, Defense Technical Information Center, Final Technical Report, School of Engineering Applied Science and Technology Binghampton, NY, August 1984 [2] Altaf Mukati: Fault Diagnosis and Testing of Digital Circuits with an Introduction to Error Control Coding, Pub. Higher Education Commission Pakistan, ISBN: 969-417-095-8, 2006 [3] Jian Shen, Abraham J.A.: Native mode functional test generation for processors with applications to self test and design validation, Test Conference (1998), Proc. International Volume, Issue, 18-23 Oct 1998 pp. 990 – 999. [4] Magdy S.Abadir, Li-C. Wang, Jayanta Bhadra: Microprocessor Test and Verification (MTV 2006), Common Challenges and Solutions, Seventh International Workshop, Austin, Texas, USA. IEEE Computer Society 4-5 December 2006, ISBN: 978-0-7695-2839-7
UbiCC Journal - Volume 3
[5] Jian Shen, Jacob A. Abraham: Native Mode Functional Test Generation for Processors with Applications to Self Test and Design Validation, Proc. Computer Engineering Research Center, The University of Texas at Austin, 1998. [6] P. P. Gelsinger: Design and Test of the 80386, IEEE Design and Test of Computers, Vol. 4, pp. 42-50, June 1987. [7] C. Montemayor et al: Multiprocessor Design Verification for the PowerPC 620 Microprocessor, Proc. Intl. Conf. on Computer Design, pp. 188-195, 1995. [8] M. S. Abadir, J. Ferguson, and T. Kirkland: Logic design verification via test generation, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Vol. 7, pp. 138148, 1988. [9] D. Moundanos, J. A. Abraham, and Y. V. Hoskote: A Unified Framework for Design Validation and Manufacturing Test, Proc. Intl. Test Conf., pp. 875-884, 1996.
6
OPERATING SYSTEMS FOR WIRELESS SENSOR NETWORKS: AN OVERVIEW Daniele De Caneva, Pier Luca Montessoro and Davide Pierattoni DIEGM – University of Udine, Italy {daniele.decaneva; montessoro; pierattoni}@uniud.it
ABSTRACT The technological trend in the recent years has led to the emergence of complete systems on a single chip with integrated low power communication and transducer capabilities. This has opened the way for wireless sensor networks: a paradigm of hundreds or even thousands of tiny, smart sensors with transducer and communication capabilities. Manage such a complex network that has to work unattended for months or years, being aware of the limited power resouces of battery-supplied nodes is a challenging task. Attending that task requires an adequate software platform, in other words an operating system specifically suited for wireless sensor networks. This paper presents a brief overview of the most known operating systems, highlighting the key challenges that have driven their design. Keywords: wireless sensor networks, operating systems.
1
INTRODUCTION
Thanks to the well known “Moore’s Law” integrated circuits are becoming smaller, cheaper and less power consuming. This trend has led to the emergence of complete systems on a chip with integrated low power communication and transducer capabilities. The consequence is the opening of the ubiquitous computing era, in which electronic systems will be all around us, providing all kind of information services to users in a distributed, omnipresent but nearly invisible fashion. One of the most important applications that new technologies are enabling is the paradigm of Wireless Sensor Networks (WSNs), where hundreds or even thousands of tiny sensors with communication capabilities will organize themselves to collect important environmental data or monitor areas for security purposes. The hardware for WSNs is ready and many applications have become a reality, nevertheless the missing of a commonly accepted system architecture and methodology constitute a curb to the expansion and the improvement of such technologies. Aware of that, many research groups in the world have proposed their own system architecture. The key point in all these proposals is the capability of the software to manage a considerable number of sensors. In particular, there is a tradeoff between the responsiveness of the system and the extremely scarce resources of the nodes in terms of power supply, memory and computational capabilities. In this article will be presented an overview of
UbiCC Journal - Volume 3
the most known operating systems designed for WSNs. Without proposing direct comparisons, we describe the key features of these architectures, the challenges that led their development, with the aim of helping the reader to choose among these systems the one that best suites his/her purposes. 2
OVERVIEW
2.1 TinyOS TinyOS [1] is virtually the state-of-the-art of sensor operating systems. Berkeley University researchers based their work aiming to face two issues: the first was to manage the concurrency intensive nature of nodes which need to keep in movement different flows of data simultaneously, while the second was to develop a system with efficient modularity but believing that hardware and software components must snap together with little processing and storage overhead. The purpose of the researchers was also to develop a system that would easily scale with the current technology trends, supporting smaller devices as well as the crossover of software components into hardware. Considering power as the most precious resource and trying to achieve high levels of concurrency, the system was designed following an event-based approach, which avoids reserving a stack space for each execution context. This design guideline was drawn from a parallelism with high performance computing, where event-based programming is the key to achieve high performance in concurrency intensive applications.
7
In TinyOS neither blocking nor polling operation is permitted and the CPU doesn’t waste time in actively looking for interesting events; on the contrary, unused CPU cycles are spent in a sleep state. System configuration can be summarized in a tiny scheduler and a set of components. The scheduler is a simple FIFO utilizing a bounded size scheduling data structure for efficiency, nonetheless a more sophisticated scheduling policy could be implemented. When the task queue is empty, the CPU is forced to the sleep state waiting for an hardware event to trigger the scheduling of the event-associated tasks. Tasks in the TinyOS architecture are atomic and run to completion although they can be preempted by events. This semantics of tasks allows the allocation of a single stack, which is an important feature in memory constrained systems. Three types of components were thought to constitute the TinyOS architecture. The first type of components are the “hardware abstraction” which map physical hardware like I/O devices into component models. The second type of components is called “synthetic hardware” and simulates the behavior of advanced hardware; often synthetic hardware sits on top of the hardware abstraction components. The last type of components are the “high level software” components, which perform control, routing and all data transformations such as data aggregation and manipulation. This kind of abstraction of hardware and software in the component model is intended to ease the exploitation of tradeoffs between the scale of integration, the power requirements and the cost of the system. Every component owns a fixed-size frame that is statically allocated: this allows to know exactly the memory requirements of a component at compile time and prevents the overhead associated with dynamic allocation. TinyOS was originally developed in C, giving the system the capability of targeting multiple CPU architectures. However, the system was afterwards re-implemented in nesC: this is a programming language specific for networked embedded systems, whose key focus is holistic approach in design. It’s remarkable that for TinyOS a byte-code interpreter has been developed that makes the system accessible to non-expert programmers and enables quick and efficient programming of a whole WSN. This interpreter, called Maté, depicts the program’s code as made of capsules. Thanks to the beaconless, ad-hoc routing protocol implemented in Maté, when a sensor node receives a newer version of a capsule, it will install it. Through a hop-by-hop code injection, Maté can update the code of the entire network.
UbiCC Journal - Volume 3
2.2 MANTIS The MultimodAl system for NeTworks of Insitu wireless Sensors [3] was developed focusing on two design key: the need for a small learning curve for users and the need for flexibility. The first objective led to fundamental choices in the architecture of the system and the programming language used for its implementation. In fact, to lower the entry barrier the researchers decided to adopt a largely diffuse design methodology, that is, the classical structure of a multithreaded operating system. For this reason MANTIS includes features like multithreading, preemptive scheduling with time slices, I/O synchronization via mutual exclusion, and standard network stack and device drivers. The second choice is associated with the purpose of flattening the learning curve for users and determinates the use of standard C as developing language for the kernel and the API. The choice of C language additionally entails the cross-platform support and the reuse of a vast legacy code base. MANTIS kernel resembles UNIX-style schedulers providing services for a subset of POSIX threads along with priority based scheduling. The thread table is allocated statically, so it can be adjusted only at compile time. The scheduler receives notes to trigger context switches from an hardware timer interrupt. This interrupt is the only kind of hardware interrupt handled by the kernel: in fact all the others interrupts are sent directly to device drivers. Context switches are triggered not only by timer events but also by system calls or semaphore operations. Besides drivers and user threads, MANTIS has a special idle, low-priority thread created by the kernel at startup. This thread could be used to implement power-aware scheduling: thanks to its position, it can detect patterns of CPU utilization and adjust kernel parameters to conserve energy. MANTIS researchers thought that wireless networking management was a critical matter, so they developed the layered network stack as a set of user level threads. In other words, they implemented different layers in different threads advocating that this choice promotes flexibility to the detriment of performance. This flexible structure is useful in particular for dynamic reprogramming, because it enables application developers to reprogram network functionalities such as routing by simply starting, stopping and deleting user-level threads. Drawing from the experience in WSNs, the developers of MANTIS gave their system a set of sophisticated features like dynamic reprogramming of sensors nodes via wireless communication, remote debugging and multimodal prototyping. MANTIS prototyping environment provides a framework for testing devices and applications
8
across heterogeneous platforms. It extends beyond simulation permitting the coexistence of both virtual and physical nodes in the same network. This feature is derived directly by the system code architecture, which can run without modifications on virtual nodes within an x86 architecture. Dynamic reprogramming in MANTIS is implemented as a system call library, which is built into the kernel. There are different granularities of reprogramming: entire kernel reflashing, reprogramming of a single thread, changing of variables within the thread. Along with dynamic reprogramming, an important feature has been also developed: the Remote Shell and Command Server which allows the user “logging in” into a node and taking control of it. The server is implemented as an application thread and gives the user the ability to alter node’s configuration, run or kill programs, inspect and modify the inner state of the node.
2.3 Contiki Contiki is an operating system based on a lightweight event-driven kernel. It was developed drawing from previous operating systems’ works with the goal of adding features like run-time loading and linking of libraries, programs and device drivers, as well as support for preemptive multithreading. Event-based systems have shown good performance for many kind of WSN’s applications; however, purely event-based systems have the penalty of being unable to respond to external events during long-lasting computations. A partial solution to this problem is adding multithreading support to the system, but this would cause additional overhead. To address these problems Contiki researchers have done the compromise of developing an event-driven kernel and implementing preemptive multithreading features as a library, which is optionally linked with programs that explicitly require it. Contiki operating system can be divided in three main components: an event driven kernel that provides basic CPU multiplexing and has no platform-specific code, a program loader and finally a set of libraries that provide higher level functionalities. From a structural point of view, a system running Contiki can be partitioned in two parts: a core and a set of loadable programs. The core is compiled into a single binary image and is unmodifiable after nodes’ deployment. The programs are loaded into the system by the program loader, which may obtain the binaries either from the communication stack (and thus from the network) or from the system’s EEPROM memory. Shared libraries like user programs may be replaced in deployed systems by using the dynamic
UbiCC Journal - Volume 3
linking. Dynamic linking is based on synchronous events: a library function is invoked by issuing an event generated by the caller program. The event broadcasts the request to all the libraries and a rendezvous protocol is used to find out the library that implements the required function. When the correct library has completed the call, the control returns back to the calling process. Since dynamic linking bases its functioning on synchronous events, it is essential that context switching overhead is as small as possible, in order to have a good system performance. Contiki developers have granted this by implementing processes as event handlers, which run without separate protection domains. The flexible mechanism of dynamic linking allowed Contiki researchers to implement multithreading as a library optionally linked with programs. Another important component based on a shared library is the communication stack. Implementing the communication stack as a library allows its dynamic replacement and, more precisely, if the stack is split into different libraries it becomes easy to replace a communication layer on the run.
2.4 PicOS PicOS is an operating system written in C and specifically aimed for microcontrollers with limited RAM on chip. In the attempt to ease the implementation of applications with constrained resource hardware platforms, PicOS creators leaned towards a programming environment, which is a collection of functions for organizing multiple activities of “reactive” applications. This environment is capable to provide services like a flavor of multitasking and tools for inter-process communication. Each process is thought as a FSM that changes its state according to the events. This approach is very effective for reactive applications, whose primary role is to respond to events rather than processing data or crunching numbers. The CPU multiplexing happens only at state boundaries: in other words FSM states can be viewed as checkpoints, at which PicOS processes can be preempted. Owing the fact that processes are preemptible at clearly defined points, potentially problematic operations on counters and flags are always atomic. On the other hand, such nonpreemptible character of PicOS processes makes this system not well suitable for real time applications. In PicOS active processes that need to wait for some events may release the CPU by issuing a “wait request”, which defines the conditions necessary to their recovery. This way the CPU resources could be destined to other processes. The PicOS system is also equipped with several advanced features, like a memory allocator capable
9
of organizing the heap area into a number of different disjoint pools, and a set of configurable device drivers including serial ports, LCD displays and Ethernet interfaces.
2.5 MagnetOS Applications often need to adapt not only to external changes but also to the internal changes initiated by the applications themselves. An example may come from a battlefront application that may modify its behavior switching from the defensive to the offensive mode: this application could change its communication pattern and reorganize the deployed components. Focusing on this point, researchers at the Cornell University argued that network-wide energy management is best provided by a distributed, power-aware operating system. Those researchers developed MagnetOS aiming to the following four goals. The first was the adaptability to resources and network changes. The second was to follow efficient policies in terms of power consumption. The third goal was giving the OS general-purpose characteristics, allowing it to execute applications over networks of nodes with heterogeneous capabilities and handling different hardware and software choices. The fourth goal was providing the system with facilities for deploying, managing and modifying executing applications. The result was a system providing a SSI, namely a Single System Image. In this abstraction, the entire network is presented to applications as a single unified Java virtual machine. The system, which follows the Distributed Virtual Machine paradigm, may be partitioned in a static and in a dynamic component. The static component rewrites regular Java applications into objects that can be distributed across the network. The dynamic component provides on each node services for application monitoring and for object creation, invocation and migration. In order to achieve good performance an auxiliary interface is provided by the MagnetOS runtime that overrides the automatic object placement decisions and allows programmers to explicitly direct object placement. MagnetOS uses two online power-aware algorithms to reduce application energy consumption and to increase system survival by moving application components within the entire network. In practice, these protocols try to move he communication endpoints in order to conserve energy. The first of them, called NetPull, works at the physical layer whereas the second one, called NetCenter, works at the network layer. 2.6 EYES This operating system was developed within the
UbiCC Journal - Volume 3
EYES European project and tries to address the problems of scarce resources in terms of both the memory and power supply and the need for distribution and reconfiguration capabilities. The researchers found solution to these problems developing a event-driven system. In fact, EYES OS is structured in modules that are executed as responses to external events, leaving the system in a power saving mode when there is no external event to serve. Every module can ask for several tasks to be performed; each task in turn defines a certain block of code that runs to completion. In this paradigm, no blocking operation is permitted and no polling operation should be instantiated: the programmer instead will use interrupts to wake up the system when the needed input becomes available. The system provides a scheduler which can be implemented as a simple FIFO or a more sophisticated algorithm. The interrupts are also seen as tasks scheduled and ready to be executed. In the EYES architecture there are two system layers of abstraction. The first layer is the “Sensor and Networking Layer”, which provides an API for the sensor nodes and the network protocols. The second layer is the “Distributed Services Layer”, which exposes an API for mobile sensor applications support. In particular, two services belong to this layer: the “Lookup Service” and the “Information Service”. The first supports mobility, instantiation and reconfiguration, while the latter deals with aspects of collecting data. On top of cited layers stand the user applications. The EYES OS provides a four-step procedure for code distribution, designed to update the code into nodes, including the operating system. This procedure is resilient to packet losses during the update, using as few communication and local resources as possible and halting the node operations only for a short period.
3
CONCLUSIONS
The operating systems here described present different approaches to the common problems of WSNs. It is not in the aim of this article to express opinions about the presented systems; nevertheless, some general guidelines could be drawn from the work experience made by all the esteemed researchers. We present now some guidelines for the development of the next generation of WSN operating systems, that should help both researchers and users. The constrained nature of resources in embedded systems is definitely evident, so a small, efficient code is a primary goal, as well as power-aware policies are an obligatory
10
Table 1: summary of WSN OS features.
Objectives
Structure
Special features
Objectives Structure
Special features
Objectives
Structure
Special features
TinyOS Manage concurrent data flows Scale easily with technology Modularity Event-based approach Tiny scheduler and a set of components No blocking or polling Developed in nesC A byte code interpreter for non-expert programmers
Mantis Small learning curve
Multithreaded OS, UNIX-style scheduler Statically-allocated thread table Developed in C Specific idle task that adjusts kernel parameters to conserve energy Remote debugging and reprogramming
Contiki Preemptive multithreading support Runtime loading and linking of libraries Lightweight event-driven kernel Multithreading features as an optionally linked library Capable of changing communication layer on the run
PicOS Aimed for microcontrollers with tiny RAM
MagnetOS Adaptability to resource and network changes Manage nodes with heterogeneous capabilities Single System Image, the entire network is a unified Java virtual machine
EYES Address problems of scarce memory and power supply
Two on-line special algorithms to reduce energy consumption
Each process thought as a FSM Multiplexing at state boundaries Written in C Memory allocator A set of configurable device drivers
Event driven OS Structured in modules executed as responses to external events Each task runs to completion Two layers of abstraction with specific API for applications and physical support Four-step procedure to update the code
Table 2: the seven expected features of the next generation WSN operating systems.
Power-aware policies Self organization Easy interface to expose data Simple way to program, update and debug network applications Power-aware communication protocols Portability Easy programming language for non-tech users
UbiCC Journal - Volume 3
11
condition to exploit the efficiency in WSN applications. To ensure a proper functioning of a network, which is constituted by unattended nodes that could have been deployed in a harsh environment, the operating system must provide a mechanism for self-organization and reorganization in case of node failures. A WSN, especially if composed of a huge number of nodes, must behave as a distributed system, exposing an interface where data and processes are accessible and manageable like it happens with databases. A large number of nodes carries also the need for an easy, yet power efficient way to program the network, which should be also usable after the deployment and without affecting normal functioning. Such a programming (and reprogramming) procedure must be robust to interference and to all other causes of transmission failures during the dissemination of code chunks. While the entire reprogramming of the core of the system may not be necessary, the applications must be patched, updated or even totally changed if the main purpose of the WSN is changed. This leads to the preference, if possible, of different levels of re-programming granularity. The operating system must treat wireless communication interfaces as special resources, providing a set of different power aware communication protocols. The system has to choose the proper protocol, according to the current environment state and application needs. The operating system should be portable to different platforms: this is necessary both for the possible presence of nodes with different tasks and for the opportunity of a post-deployment of newer sensors, which could be placed in order to reintegrate the network node set. The operating system should provide a platform for fast prototyping, testing and debugging application programs. In this context it is remarkable to note that, if the WSN paradigm will spread in a kaleidoscopic set of applications, touching many aspects of our life,
UbiCC Journal - Volume 3
then program developers will not be just communication and computer engineers. It appears clear that, in order to support nontechnical developers, a really simple API or even an application-typology programming language must be provided, alongside with the “normal” and more efficient API. Making WSN easy to use will make them more attractive and step up their diffusion. 4
REFERENCES
[1] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, K. Pister: System Architecture Directions for Networked Sensors, ASPLOS. (2000). [2] K. Sohraby, D. Minoli, T. Znati: Wireless Sensor Networks: technology, Protocols and Applications, John Wiley & Sons Inc. (2007). [3] H. Abrach, S. Bhatti, J. Carlson, H. Dai J. Rose, A. sheth, B. Sheth, B. Shucker, J. Deng, R. Han: MANTIS: System Support for MultimodAl NeTworks of In-situ Sensors, Proceedings of the 2nd ACM International Conference on Wireless Sensor Networks and Applications. (2003). [4] A. Dunkels, B. Grönvall, T. Voigt, J. Alonso: The Design for a Lightweight Portable Operating System for Tiny Networked Sensor Devices, SICS Technical Report (2004). [5] E. Akhmetshina, P. Gburzynski, F. Vizecoumar: PicOS: A Tiny Operation System for Extremely Small Embedded Platforms, Proceedings of the Conference on Embedded System and Applications ESA’02 (2002). [6] R. Barr, J. Bicket, D. S. Dantas, B. Du, T.W.D. Kim, B. Zhou, E. Sirer: On the need for systemlevel support for ad hoc and sensor networks, SIGOPS Oper. Syst. Rev. (2002). [7] S. Dulman, P. Havinga: Operating System Fundamentals for the EYES Distributed Sensor Network, Proceedings of Progress’02 (2002).
12
Performance Evaluation of Deadline Monotonic Policy over 802.11 protocol Ines El Korbi and Leila Azouz Saidane National School of Computer Science University of Manouba, 2010 Tunisia Emails:
[email protected] [email protected]
ABSTRACT Real time applications are characterized by their delay bounds. To satisfy the Quality of Service (QoS) requirements of such flows over wireless communications, we enhance the 802.11 protocol to support the Deadline Monotonic (DM) scheduling policy. Then, we propose to evaluate the performance of DM in terms of throughput, average medium access delay and medium access delay distrbution. To evaluate the performance of the DM policy, we develop a Markov chain based analytical model and derive expressions of the throughput, the average MAC layer service time and the service time distribution. Therefore, we validate the mathematical model and extend analytial results to a multi-hop network by simulation using the ns-2 network simulator. Keywords: Deadline Monotonic, 802.11, Performance evaluation, Average medium access delay, Throughput, Probabilistic medium access delay bounds.
1
INTRODUCTION
Supporting applications with QoS requirements has become an important challenge for all communications networks. In wireless LANs, the IEEE 802.11 protocol [5] has been enhanced and the IEEE 802.11e protocol [6] was proposed to support quality of service over wireless communications. In the absence of a coordination point, the IEEE 802.11 defines the Distributed Coordination Function (DCF) based on the Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) protocol. The IEEE 802.11e proposes the Enhanced Distributed Channel Access (EDCA) as an extension for DCF. With EDCA, each station maintains four priorities called Access Categories (ACs). The quality of service offered to each flow depends on the AC to which it belongs. Nevertheless, the granularity of service offered by 802.11e (4 priorities at most) can not satisfy the real time flows requirements (where each flow is characterized by its own delay bound). Therefore, we propose in this paper a new medium access mechanism based on the Deadline Monotonic (DM) policy [9] to schedule real time flows over 802.11. Indeed DM is a real time scheduling policy that assigns static priorities to flow packets according to their deadlines; the packet with the shortest deadline being assigned the highest priority. To support the DM policy over 802.11, we
UbiCC Journal - Volume 3
use a distributed scheduling and introduce a new medium access backoff policy. Therefore, we focus on performance evaluation of the DM policy in terms of achievable throughput, average MAC layer service time and MAC layer service time distribution. Hence, we follow these steps: First, we propose a Markov Chain framework modeling the backoff process of n contending stations within the same broadcast region [1]. Due to the complexity of the mathematical model, we restrict the analysis to n contending stations belonging to two traffic categories (each traffic category is characterized by its own delay bound). From the analytical model, we derive the throughput achieved by each traffic category. Then, we use the generalized Z-transforms [3] to derive expressions of the average MAC layer service time and the service time distribution. As the analytical model was restricted to two traffic categories, analytical results are extended by simulation to different traffic categories. Finally, we consider a simple multi-hop scenario to deduce the behavior of the DM policy in a multi hop environment.
13
The rest of this paper is organized as follows. In section 2, we review the state of the art of the IEEE 802.11 DCF, QoS support over 802.11 mainly the IEEE 80.211e EDCA and real time scheduling over 802.11. In section 3, we present the distributed scheduling and introduce the new medium access backoff policy to support DM over 802.11. In section 4, we present our mathematical model based on Markov chain analysis. Section 5 and 6 present respectively throughput and the service time analysis. Analytical results are validated by simulation using the ns-2 network simulator [16]. In section 7, we extend our study by simulation, first to take into consideration different traffic categories, second, to study the behavior of the DM algorithm in a multi-hop environment where factors like interferences or routing protocols exist. Finally, we conclude the paper in section 8. 2
LITTERATURE REVIEWS
2.1 The 802.11 protocol 2.1.1 Description of the IEEE 802.11 DCF Using DCF, a station shall ensure that the channel is idle when it attempts to transmit. Then it selects a random backoff in the contention window 0 , CW 1 , where CW is the current window size and varies between the minimum and the maximum contention window sizes. If the channel is sensed busy, the station suspends its backoff until the channel becomes idle for a Distributed Inter Frame Space (DIFS) after a successful transmission or an Extended Inter Frame Space (EIFS) after a collision. The packet is transmitted when the backoff reaches zero. A packet is dropped if it collides after maximum retransmission attempts. The above described two way handshaking packet transmission procedure is called basic access mechanism. DCF defines a four way handshaking technique called Request To Send/Clear To Send (RTS/CTS) to prevent the hidden station problem. A station S j is said to be hidden from S i if S j is within the transmission range of the receiver of S i and out of the transmission range of S i . 2.1.2 Performance evaluation of the 802.11 DCF Different works have been proposed to evaluate the performance of the 802.11 protocol based on Bianchi’s work [1]. Indeed, Bianchi proposed a Markov chain based analytical model to evaluate the saturation throughput of the 802.11 protocol. By saturation conditions, it’s meant that contending stations have always packets to transmit. Several works extended the Bianchi model either to suit more realistic scenarios or to evaluate other performance parameters. Indeed, the authors of [2] incorporate the frame retry limits in the Bianchi’s model and show that Bianchi overestimates the
UbiCC Journal - Volume 3
maximum achievable throughput. The native model is also extended in [10] to a non saturated environment. In [12], the authors derive the average packet service time at a 802.11 node. A new generalized Z-transform based framework has been proposed in [3] to derive probabilistic bounds on MAC layer service time. Therefore, it would be possible to provide probabilistic end to end delay bounds in a wireless network. 2.2 Supporting QoS over 802.11 2.2.1 Differentiation mechanisms over 802.11 Emerging applications like audio and video applications require quality of service guarantees in terms of throughput delay, jitter, loss rate, etc. Transmitting such flows over wireless communications require supporting service differentiation mechanisms over such networks. Many medium access schemes have been proposed to provide some QoS enhancements over the IEEE 802.11 WLAN. Indeed, [4] assigns different priorities to the incoming flows. Priority classes are differentiated according to one of three 802.11 parameters: the backoff increase function, the Inter Frame Spacing (IFS) and the maximum frame length. Experiments show that all the three differentiation schemes offer better guarantees for the highest priority flow. But the backoff increase function mechanism doesn’t perform well with TCP flows because ACKs affect the differentiation mechanism. In [7], an algorithm is proposed to provide service differentiation using two parameters of IEEE 802.11, the backoff interval and the IFS. With this scheme high priority stations are more likely to access the medium than low priority ones. The above described researches led to the standardization of a new protocol that supports QoS over 802.11, the IEEE 802.11e protocol [6]. 2.2.2 The IEEE 802.11e EDCA The IEEE 802.11e proposes a new medium access mechanism called the Enhanced Distributed Channel Access (EDCA), that enhances the IEEE 802.11 DCF. With EDCA, each station maintains four priorities called Access Categories (ACs). Each access category is characterized by a minimum and a maximum contention window sizes and an Arbitration Inter Frame Spacing (AIFS). Different analytical models have been proposed to evaluate the performance of 802.11e EDCA. In [17], Xiao extends Bianchi’s model to the prioritized schemes provided by 802.11e by introducing multiple ACs with distinct minimum and maximum contention window sizes. But the AIFS differentiation parameter is lacking in Xiao’s model. Recently Osterbo and Al. have proposed
-
14
different works to evaluate the performance of the IEEE 802.11e EDCA [13], [14], [15]. They proposed a model that takes into consideration all the differentiation parameters of the EDCA especially the AIFS one. Moreover different parameters of QoS have been evaluated such as throughput, average service time, service time distribution and probabilistic response time bounds for both saturated and non saturated cases. Although the IEEE 802.11e EDCA classifies the traffic into four prioritized ACs, there is still no guarantee of real time transmission service. This is due to the lack of a satisfactory scheduling method for various delay-sensitive flows. Hence, we need a scheduling policy dedicated to such delay sensitive flows. 2.3 Real time scheduling over 802.11 A distributed solution for the support of realtime sources over IEEE 802.11, called Blackburst, is discussed in [8]. This scheme modifies the MAC protocol to send short transmissions in order to gain priority for real-time service. It is shown that this approach is able to support bounded delays. The main drawback of this scheme is that it requires constant intervals for high priority traffic; otherwise the performance degrades very much. In [18], the authors introduced a distributed priority scheduling over 802.11 to support a class of dynamic priority schedulers such as Earliest Deadline First (EDF) or Virtual Clock (VC). Indeed, the EDF policy is used to schedule real time flows according to their absolute deadlines, where the absolute deadline is the node arrival time plus the delay bound. To realize a distributed scheduling over 802.11, the authors of [18] used a priority broadcast mechanism where each station maintains an entry for the highest priority packet of all other stations. Thus, stations can adjust their backoff according to other stations priorities. The overhead introduced by the broadcast priority mechanism is negligible. This is due to the fact that priorities are exchanged using native DATA and ACK packets. Nevertheless, authors of [18] proposed a generic backoff policy that can be used by a class of dynamic priority schedulers no matter if this scheduler targets delay sensitive flows or rate sensitive flows. In this paper, we focus on delay sensitive flows and propose to support the fixed priority Deadline Monotonic (DM) policy over 802.11 to schedule delay sensitive flows. For instance, we use a priority broadcast mechanism similar to [18] and introduce a new medium access backoff policy where the
UbiCC Journal - Volume 3
backoff value is inferred from the deadline information. 3
SUPPORTING DEADLINE MONOTONIC (DM) POLICY OVER 802.11
With DCF all the stations share the same transmission medium. Then, the HOL (Head of Line) packets of all the stations (highest priority packets) will contend for the channel with the same priority even if they have different deadlines. Introducing DM over 802.11 allows stations having packets with short deadlines to access the channel with higher priority than those having packets with long deadlines. Providing such a QoS requires distributed scheduling and a new medium access policy. 3.1 Distributed Scheduling over 802.11 To realize a distributed scheduling over 802.11, we introduce a priority broadcast mechanism similar to [18]. Indeed each station maintains a local scheduling table with entries for HOL packets of all other stations. Each entry in the scheduling table of node S i comprises two fields S j , D j where S j is
the source node MAC address and D j is the deadline of the HOL packet of node S j . To broadcast HOL packets deadlines, we propose to use the two way handshake DATA/ACK access mode. When a node S i transmits a DATA packet, it piggybacks the deadline of its HOL packet. Nodes hearing the DATA packet add an entry for S i in their local scheduling tables by filling the corresponding fields. The receiver of the DATA packet copies the priority of the HOL packet in ACK before sending the ACK frame. All the stations that did not hear the DATA packet add an entry for S i using the information in the ACK packet. 3.2 DM medium access backoff policy Let’s consider two stations S 1
and
S2 transmitting two flows with the same deadline D1 ( D1 is expressed as a number of 802.11 slots). The two stations having the same delay bound can access the channel with the same priority using the native 802.11 DCF. Now, we suppose that S 1 and S 2 transmit flows with different delay bounds D1 and D2 such as D1 D2 , and generate two packets at time instants t 1 and t 2 . If S 2 had the same delay bound as S 1 , its packet would have been generated at time t '2 such
as t '2 t 2 D 21 , where D 21 D 2 D1 .
At that time, S 1 and S 2 would have the same priority and transmit their packets according to the -
15
802.11 protocol. Thus, to support DM over 802.11, each station uses a new backoff policy where the backoff is given by: The random backoff selected in 0 , CW 1 according to 802.11 DCF, referred as BAsic Backoff (BAB). The DM Shifting Backoff (DMSB): corresponds to the additional backoff slots that a station with low priority (the HOL packet having a large deadline) adds to its BAB to have the same priority as the station with the highest priority (the HOL packet having the shortest deadline).
following assumptions: Assumption 1: The system under study comprises n contending stations hearing each other transmissions. Assumption 2: Each station S i transmits a flow Fi with a delay bound Di . The n stations are divided into two traffic categories C1 and C 2 such as: C1 represents n1 nodes transmitting flows
with delay bound D2 , such as D1 D2 , D 21 D 2 D1 and n1 n 2 n .
Whenever a station S i sends an ACK or hears an ACK on the channel its DMSB is revaluated as follows:
DMSBS i DeadlineHOLS i DTmin S i (1) Where DTmin S i is the minimum of the HOL packet deadlines present in S i scheduling table and DeadlineHOLS i is the HOL packet deadline of
node S i . Hence, when S i has to transmit its HOL packet with a delay bound Di , it selects a BAB in the
contention window 0 , CW min 1 and computes the WHole Backoff (WHB) value as follows: WHBS i DMSBS i BAB S i
(2)
The station S i decrements its BAB when it senses an idle slot. Now, we suppose that S i senses the channel busy. If a successful transmission is heard, then S i revaluates its DMSB when a correct ACK is heard. Then the station S i adds the new DMSB value to its current BAB as in equation (2). Whereas, if a collision is heard, S i reinitializes its DMSB and adds it to its current BAB to allow colliding stations contending with the same priority as for their first transmission attempt. S i transmits when its WHB reaches 0. If the transmission fails, S i doubles its contention window size and repeats the above procedure until the packet is successfully transmitted or dropped after maximum retransmission attempts. 4
Assumption 3: We operate in saturation conditions: each station has immediately a packet available for transmission after the service completion of the previous packet [1]. Assumption 4: A station selects a BAB in a constant contention window 0 ,W 1 independently of the transmission attempt. This is a simplifying assumption to limit the complexity of the mathematical model. Assumption 5: We are in stationary conditions, i.e. the n stations have already sent one packet at least. Depending on the traffic category to which it belongs, each station S i will be modeled by a Markov Chain representing its whole backoff (WHB) process. 4.1 Markov chain modeling a station of category C1 Figure 1 illustrates the Markov chain modeling a station S 1 of category C1 . The states of this Markov chain are described by the following quadruplet R , i , i j , D21 where:
UbiCC Journal - Volume 3
R : takes two values denoted by C 2 and ~ C 2 . When R ~ C 2 , the n 2 stations of category C 2 are decrementing their shifting backoff (DMSB) during D21 slots and wouldn’t contend for the channel. When R C 2 , the D21 slots had already been
MATHEMATICAL MODEL OF THE DM POLICY OVER 802.11
In this section, we propose a mathematical model to evaluate the performance of the DM policy using Markov chain analysis [1]. We consider the
with delay bound D1 . C 2 represents n 2 nodes transmitting flows
elapsed and stations of category C 2 will contend for the channel. i : the value of the BAB selected by S 1 in
0 ,W 1 .
-
16
Figure 1: Markov chain modeling a category C1 Station
i j :
corresponds to the current backoff
of the station S 1 . D21 : corresponds to D 2 D1 . We choose the negative notation D21 for stations of C1 to express the fact that only stations of category C 2 have a positive DMSB equal to D21 . Initially S 1 selects a random BAB and is in one of the states ~ C 2 , i , i , D 21 , i 0..W 1 . During D21 1 slots, S 1 decrements its backoff if none of the n1 1 remaining stations of category C1 transmits. Indeed, during these slots, the n 2 stations of category C 2 are decrementing their DMSB and wouldn’t contend for the channel. When
S1
is
in
~ C 2 , i , i D21 1, D21 ,
one
of
the
states
i D 21 ..W 1
senses the channel idle, it decrements its
th D21
slot.
category C 2 can contend for the channel (the D21 slots had been elapsed). Hence, S 1 moves to one of
the states C 2 , i , i D21 , D 21 , i D21 ..W 1 .
However, when the station S 1 is in one of the states ~ C 2 , i , i j , D 21 , for i 1..W 1 , j 0.. minD 21 1, i 1 and at least one of the remaining stations
of category
C1 transmits, then the stations of category C 2 will reinitialize their DMSB and wouldn’t contend for channel during additional D21 slots. Therefore, S 1
UbiCC Journal - Volume 3
Now, If
S1
is in
one of
the states
C 2 , i , i D21 , D21 , i D21 1..W 1 and at least one of the n 1 remaining stations (either a category C1 or a category C 2 station) transmits, then
S1
moves
to
one
of
the
states
~ C 2 , i D21 , i D21 , D21 , i D21 1..W 1 .
Markov chain modeling a station of category C2 Figure 2 illustrates the Markov chain modeling a station S 2 of category C 2 . Each state of S 2 Markov chain is represented by the quadruplet i , k , D21 j , D21 where: i : refers to the BAB value selected by S 2 in 4.2
0 ,W 1 .
and
But S 1 knows that henceforth the n 2 stations of
n1 1
to the state ~ C 2 , i j , i j , D 21 , i 1..W 1 , j 0.. minD21 1, i 1 .
moves
k : refers to the current BAB value of S 2 . D 21 j : refers to the current DMSB of S 2 ,
j 0 , D21 . D21 : corresponds to D D . 2 1
When S 2 selects a BAB, its DMSB equals D 21 and is in one of the states i , i , D 21 , D 21 ,
i 0..W 1 . During D 21 slots, only the n1 stations of category C1 contend for the channel. If S 2 senses the channel idle during D21 slots, it
moves to one of the states i , i ,0 , D 21 , i 0..W 1 , where it ends its shifting backoff.
-
17
Figure 2: Markov chain modeling a category C 2 Station When S 2 is in one of the states i , i ,0 , D 21 ,
i 0..W 1 , the n 2 1 other stations of category C 2 have also decremented their DMSB and can
contend for the channel. Thus, S 2 decrements its
i , i 1,0 , D 21 , n 1 remaining
BAB and moves to the state
i 2..W 1 , only if none of the stations transmits.
If S 2 is in one of the states i , i 1,0 , D 21 , i 2..W 1 , and at least one of the n 1 remaining stations transmits, the n 2 stations of category C 2 will reinitialize their DMSB and S 2 moves to the state i 1, i 1, D21 , D21 , i 2..W 1 . 4.3 Blocking probabilities in the Markov chains According to the explanations given in paragraphs 4.1 and 4.2, the states of the Markov chains modeling stations S 1 and S 2 can be divided into the following groups:
1 : the set of states of S 1 where none of the n 2 stations of category C 2 contends for the channel (blue states in figure 1). 1 ~ C 2 , i , i j , D 21 , i 0..W 1, j 0.. minmax0 , i 1, D 21 1 1 : the set of states of S 1 where stations of category C 2 can contend for the channel (pink states in figure 1). 1 C 2 , i , i D 21 , D 21 , i D 21 ..W 1
2 : the set of states of S 2 where stations of category C 2 do not contend for the channel (blue states in figure 2).
UbiCC Journal - Volume 3
2 i , i , D 21 j , D 21 , i 0..W 1, j 0..D 21 1 2 : the set of states of S 2 , where stations of category C 2 contend for the channel (pink states in figure 2). 2 i , i ,0 , D21 , i 0..W 1 i , i 1,0 , D 21 , i 2..W 1
Therefore, when stations of category C1 are in one the states of 1 , stations of category C 2 are in one of the states of 2 . Similarly, when stations of category C1 are is in one of the states of 1 , stations of category C 2 are in one of the states of 2. Hence, we derive the expressions of blocking probabilities figure 1 as follows:
p 11 and
S1 p 12 shown in
p 11 : the probability that S 1 is blocked given that S 1 is in one of the states of 1 . p 11 is the probability that at least a station S 1' of
the other n1 1 stations of C1 transmits given that S 1' is in one of the states of 1 .
p 11 1 1 11 n1 1
(3)
where 11 is the probability that a station S 1' of C1 transmits given that S 1' is in one of the states of 1 :
11 Pr S 1' transmits 1
~ C2 ,0 ,0 , D21
1
W 1 min max 0 ,i 1,D21 1 1~ C2 ,i ,i j , D21 i 0 j 0
(4)
-
18
1R ,i ,i j , D21 is defined as the probability of
the state R , i , i j , D21 , in the stationary
R ,i ,i j , D21
conditions and 1 1
is the
probability vector of a category C1 station.
p 12 : the probability that S 1 is blocked given that S 1 is in one of the states of 1 . p 12 is the probability that at least a station S 1'
n1 1
of the other
S 1'
transmits given that
stations of C1
is in one of the states
p 22 : the probability that S 2 is blocked given that S 2 is in one of the states of 2 . p 22 1 1 12 n1 1 22 n2 1
The blocking probabilities described above allow deducing the transition state probabilities and having the transition probability matrix Pi , for a station of traffic category C i . Therefore, we can evaluate the state probabilities by solving the following system [11]:
i Pi i ij 1 j
of 1 or at least a station S '2 of the n 2 stations of C 2 transmits given that one of the states of 2 .
S '2
p 12 1 1 12 n1 1 1 22 n2
where 12 is the probability that a station
S 1' of C1 transmits given that S 1' is in one of the states of 1 .
12 Pr S 1' transmits 1 C2 ,D21 ,0 , D21
1
W 1
(6)
C2 ,i ,i D21 , D21
i D21
and 22 the probability that a station S 2' of
C 2 transmits given that S 2' is in one of the states of 2 .
0 ,0 ,0 ,D21
i ,i ,0 ,D21
2
W 1
(7) i ,i 1,0 ,D21
2
2i ,k ,D21 j ,D21 is defined as the probability
i , k , D21 j , D 21 ,
of the state
in the
i ,k ,D21 j ,D21
stationary condition. 2 2
is the probability vector of a category C 2 station. In the same way, we evaluate p 21 and p 22 the blocking probabilities of station S 2 shown in figure 2: p 21 : the probability that S 2 is blocked given that S 2 is in one of the states of 2 .
p 21 1 1 11
UbiCC Journal - Volume 3
the station S 1 of category C1 . P1 i , j is the probability to transit from state i to state j . We have:
P1 ~ C 2 , i , i j , D 21 , ~ C 2 , i , i j 1, D 21
(11) P1 ~ C 2 , i ,1, D 21 , ~ C 2 ,0 ,0 , D 21 1 p11 ,
i 1.. minW 1, D 21 1
(12) P1 ~ C 2 , i , i D 21 1, D 21 , C 2 , i , i D 21 , D 21
P1~ C2 , i , i j , D21 , ~ C2 , i j , i j , D21
p11 , i 2..W 1, j 1.. mini 1, D21 1
P1~ C2 , i , i , D21 , ~ C2 , i , i , D21 p11 ,
i2
i 0
4.4 Transition probability matrices 4.4.1 Transition probability matrix of a category C1 station Let P1 be the transition probability matrix of
1 p 11 , i D 21 ..W 1
2 W 1
(10)
1 p11 , i 2..W 1, j 0.. mini 2 , D 21 2
1
12 Pr S 2' transmits 2
is in (5)
(9)
n1
(8)
i 1..W 1
(13) (14)
(15)
P1C2 ,i ,i D21 , D21 ,~ C2 ,i D21 ,i D21 , D21
p12 ,i D21 1..W 1
(16) P1C2 ,i ,i D21 , D21 ,C2 ,i 1,i 1 D21 , D21
1 p12 ,i D21 1..W 1
(17) P1~ C2 ,0 ,0 , D21 , ~ C2 , i , i , D21
1 , W
(18)
i 0..W 1
If D 21 W then:
-
19
P1C2 , D21 ,0 , D21 , ~ C2 , i , i , D21
1 , W
11 f 11 , 12 , 22 f , , 11 12 22 12 22 f 11 , 12 , 22 under the constraint 11 0 , 12 0 , 22 0 , 11 1, 12 1, 22 1 (28)
(19)
i 0..W 1
By replacing p11 and p 12 by their values in equations (3) and (5) and by replacing P1 and 1 in (10) and solving the resulting system, we can R ,i ,i j , D21
express 1
as a function of 11 , 12 and
22 given respectively by equations (4), (6) and (7). 4.4.2 Transition probability matrix of a category C2 station Let P2 be the transition probability matrix of the station S 2 belonging to the traffic category C 2 . The transition probabilities of S 2 are: P2 i , i , D21 j , D21 , i , i , D21 j 1, D21
1 p21 , i 0..W 1, j 0..D21 1
(20)
5
P2 i , i , D21 j , D21 , i , i , D21 , D21 p21 ,
P2 i , i ,0 , D21 , i , i 1,0 , D21 1 p22 , i 2..W 1
P2 i , i ,0 , D21 , i , i , D21 , D21 p22 ,
(24)
i 1..W 1 P2 i , i 1,0 , D21 , i 1, i 1, D21 , D21 p22 , i 2..W 1 P2 i , i 1,0 , D21 , i 1, i 2 ,0 , D21 1 p22 , i 3..W 1
(25)
P1,s Pr S1 transmits successfully 1
Pr1 Pr 1 11 1 p11 Pr 1 12 1 p12 Pr 1 Pr S1 transmits successfully 1
(29)
P2 ,s Pr S 2 transmits successfully 2
Pr S 2 transmits successfully 2 22 1 p 22 Pr 2
(26)
1 , i 0..W 1 (27) W
By replacing p 21 and p 22 by their values in
Si
packet successfully. Let S 1 and S 2 be two stations belonging respectively to traffic categories C1 and C 2 . We have:
(22) (23)
Pi ,s : the probability that a station
belonging to traffic category C i transmits a
(21)
P2 1,1,0 , D21 , 0 ,0 ,0 , D21 1 p22
THROUGHPUT ANALYSIS
In this section, we propose to evaluate Bi , the normalized throughput achieved by a station of traffic category C i [1]. Hence, we define:
i 0..W 1, j 0..D21 1
P2 0 ,0 ,0 , D21 , i , i , D21 , D21
Solving the above system (28), allows deducing the expressions of 11 , 12 and 22 , and deriving the state probabilities of Markov chains modeling category C1 and category C 2 stations.
Pr 2 Pr 2
(30) Pidle : the probability that the channel is idle. The channel is idle if the n1 stations of
category C1 don’t transmit given that these stations are in one of the states of 1 or if the n stations
equations (8) and (9) and by replacing P2 and 2 in (10) and solving the resulting system, we can
(both category C1 and category C 2 stations) don’t
express 2
one of the states of 1 . Thus:
i ,k ,D21 j ,D21
as a function of 11 , 12
and 22 given respectively by equations (4), (6) R ,i ,i j , D21
and (7). Moreover, by replacing 1
and
2i ,k ,D21 j ,D21 by their values, in equations (4), (6)
and (7), we obtain a system of non linear equations as follows:
UbiCC Journal - Volume 3
transmit given that stations of category C1 are in
Pidle 1 11 n1 Pr 1 1 12 n1 1 22 n2 Pr 1 (31) Hence, the expression of the throughput of a category C i station is given by:
-
20
Bi
Pi ,s T p PIdle Te Ps T s 1 PIdle
2
n P
i i ,s
i 1
Tc (32)
Where Te denotes the duration of an empty slot, Ts and Tc denote respectively the duration of a successful transmission and a collision. 2 1 PIdle ni Pi ,s corresponds to the i 1 probability of collision. Finally T p denotes the
8 and we depict the throughput achieved by the different stations present in the network as a function of the contention window size W , D 21 1 . We notice that the throughput achieved by category C1 stations (stations numbered from S 11 to S 14 ) is greater than the one achieved by category C 2 stations (stations numbered from S 21 to S 24 ).
average time required to transmit the packet data payload. We have:
Ts T PHY TMAC T p T D SIFS
TPHY
T ACK T D DIFS
Tc TPHY TMAC T p TD EIFS
(33) (34)
Where T PHY , TMAC and T ACK are the durations of the PHY header, the MAC header and the ACK packet [1], [13]. TD is the time required to transmit the two bytes deadline information. Stations hearing a collision wait during EIFS before resuming their packets. For numerical results stations transmit 512 bytes data packets using 802.11.b MAC and PHY layers parameters (given in table 1) with a data rate equal to 11Mbps. For simulation scenarios, the propagation model is a two ray ground model. The transmission range of each node is 250m. The distance between two neighbors is 5m. The EIFS parameter is set to ACKTimeout as in ns-2, where:
ACKTimeout DIFS T PHY T ACK T D SIFS (35) Table 1: 802.11 b parameters. Data Rate Slot SIFS DIFS PHY Header MAC Header ACK Short Retry Limit
11 Mb/s 20 µs 10 µs 50 µs 192 µs 272 µs 112 µs 7
For all the scenarios, we consider that we are in n stations presence of n contending stations with 2 for each traffic category. In figure 3, n is fixed to
UbiCC Journal - Volume 3
Figure 3: Normalized throughput as a function of the contention window size D 21 1, n 8 Analytically, stations belonging to the same traffic category have the same throughput given by equation (32). Simulation results validate analytical results and show that stations belonging to the same traffic category (either category C1 or category C 2 ) have nearly the same throughput. Thus, we conclude the fairness of DM between stations of the same category. For subsequent throughput scenarios, we focus on one representative station of each traffic category. Figure 4, compares category C1 and category C 2 stations throughputs to the one obtained with 802.11. Curves are represented as a function of W and for different values of D 21 . Indeed as D 21 increases, the category C1 station throughput increases, whereas the category C 2 station throughput decreases. Moreover as W increases, the difference between stations throughputs is reduced. This is due to the fact that the shifting backoff becomes negligible compared to the contention window size. Finally, we notice that the category C1 station obtains better throughput with DM than with
-
21
802.11, but the opposite scenario happens to the category C 2 station.
expression of the average service time and the service time distribution. The service time depends on the duration of an idle slot Te , the duration of a successful transmission Ts and the duration of a collision Tc [1], [3],[14]. As Te is the smallest duration event, the duration of all events will be T given by event . T e 6.1 Z-Transform of the MAC layer service time 6.1.1 Service time Z-transform of a category C1 station: Let TS 1 Z be the service time Z-transform of
Figure 4: Normalized throughput as a function of the contention window size (different D 21 values) In figure 5, we generalize the results for different numbers of contending stations and fix the contention window size W to 32.
a station S 1 belonging to traffic category C1 . We define:
H 1R ,i ,i j , D21 Z : The Z-transform of the time already elapsed from the instant S 1 selects a
basic backoff in 0 ,W 1 (i.e. being in one of the states ~ C 2 , i , i , D21 ) to the time it is found in the state R , i , i j , D 21 . Moreover, we define:
11 Psuc : the probability that S 1 observes a successful transmission on the channel, while S 1 is in one of the states of 1 . 11 Psuc n1 1 11 1 11 n1 2
Figure 5: Normalized throughput as a function of the number of contending stations All the curves show that DM performs service differentiation over 802.11 and offers better throughput for category C1 stations independently of the number of contending stations. 6
SERVICE TIME ANALYSIS
In this section, we evaluate the average MAC layer service time of category C1 and category C 2 stations using the DM policy. The service time is the time interval from the time instant that a packet becomes at the head of the queue and starts to contend for transmission to the time instant that either the packet is acknowledged for a successful transmission or dropped [3].
(36)
12 : the probability that S 1 observes a Psuc successful transmission on the channel, while S 1 is in one of the states of 1 . 12 Psuc n1 1 12 1 12 n1 2 1 22 n2
n2 22 1 22 n2 1 1 12 n1 1
(37)
We evaluate H 1R ,i ,i j , D21 Z for each state of S 1 Markov chain as follows: Ts 1 11 Te H 1~ C2 ,i ,i , D21 Z Psuc Z W Tc min i D21 1,W 1 T 11 p11 Psuc Z e H 1~ C 2 ,k ,i , D21 Z k i 1
Ts Tc 12 Te T 12 ˆ H 1C 2 ,i D21 ,i , D21 Z Psuc Z p11 Psuc Z e
(38) We propose to evaluate the Z-Transform of the MAC layer service time [3], [14], [15] to derive an
UbiCC Journal - Volume 3
Where:
-
22
Hˆ 1C ,i D ,i , D Z H 1C ,i D ,i , D Z 2 21 21 2 21 21 (39) if i D W 1 21 ˆ H 1C2 ,i D21 ,i , D21 Z 0 Otherwise We also have:
1 p11 Z j H 1~ C2 ,i ,i , D21 Z
H 1~ C2 ,i ,i j , D21 Z
1
Ts 11 Te Psuc Z
p11
i 2..W 1, j 1..mini 1, D21 1
H 1C2 ,i ,i D21 , D21 Z
Tc T 11 Psuc Z e
11 1 Psuc Z
p11
Tc T 11 Psuc Z e
21
2
H 1
Z
i
Tc T Z e p11 H 1~ C2 ,0 ,0 , D21 Z p12 H 1C 2 ,D21 ,0 , D21 Z
m 1
(41)
(42)
basic backoff in 0 ,W 1 (i.e. being in one of the states i , i , D 21 , D 21 ) to the time it is found in the
Tc Te
Ts 11 Te Psuc Z
min W 1,D21 1 i2
p11H 1~C ,0 ,0 , D
time Z-transform of a station S 2 of category C 2 . We define: H 2i ,k ,D21 j ,D21 Z : The Z-transform of the
1 p11 ZH 1~ C2 ,1,1, D21 Z 1
1 p11 Z
p12 H 1C 2 ,D21 ,0 , D21 Z
i 0
time already elapsed from the instant S 2 selects a
11 p11 Psuc Z
H 1~ C2 ,0 ,0 , D21 Z
1 p12 H 1C2 ,D21 ,0 , D21 Z
6.1.2 Service time Z-transform of a category C2 station: In the same way, let TS 2 Z be the service
Ts Te
H 1~ C 2 ,W 1,W 1, D21 Z
Ts Te
Z
T m c Te Z
1 p11 Z D21 H 1~ C2 ,i ,i , D21 Z
H 1C2 ,W 1,W 1 D21 , D21 Z
1 p11 Z
21
2
(44)
1 p12 ZH 1C2 ,i 1,i 1 D21 , D21 Z ,i D21 ..W 2
1 p11 H 1~ C ,0 ,0 , D
(40)
11 1 Psuc Z
D21
TS1 Z Z
Ts Te
p11
~ C 2 ,i ,1, D21
Z
11 Psuc
state i , k , D21 j , D21 . Moreover, we define:
Tc T Z e
1 W
21 Psuc n1 11 1 11 n1 1
(43)
If S 1 transmission state is ~ C 2 ,0 ,0 , D 21 , the transmission will be successful only if none of the n1 1 remaining stations of C1 transmits.
n 1
remaining
stations (either a category C1 or a category C 2 station) transmits. If the transmission fails, S 1 tries another transmission. After m retransmissions, if the packet is not acknowledged, it will be dropped. Thus, the Z-transform of station S 1 service time is:
(45)
22 : the probability that S 2 observes a Psuc successful transmission on the channel, while S 2 is in one of the states of 2 . 22 Psuc n1 12 1 12 n1 1 1 22 n2 1
Whereas when the station S 1 transmission state is C 2 , D21 ,0 , D 21 , the transmission occurs successfully only if none of
21 : the probability that S 2 observes a Psuc successful transmission on the channel, while S 2 is in one of the states of 2 .
n2 1 22 1 22 n2 2 1 12 n1
(46)
We evaluate H 2i ,i ,D21 j ,D21 Z for each state of S 2 Markov chain as follows: 1 H 2i ,i ,D21 ,D21 Z , i 0 and i W 1 (47) W Ts 1 22 Te H 2i ,i ,D21 ,D21 Z Psuc Z W
22 p 22 Psuc
Z
Tc Te
H 2i 1,i ,0 ,D21 Z , i 1..W 2 (48)
UbiCC Journal - Volume 3
-
23
To compute H 2i ,i ,D21 j ,D21 Z , we
define
j Tdec Z , such as:
0 Z 1 Tdec
Tc Te TS 2 Z p 22 Z H 20 ,0 ,0 ,D21 Z
(49)
1 p 22 Z
1 p 21 Z
j Z Tdec
Ts Tc 21 Te T j 1 21 Z 1 Psuc Z p 21 Psuc Z e Tdec for j 1..D 21 (50)
So:
H 2i ,i ,D21 j ,D21 Z H 2i ,i ,D21 j 1,D21 Z
i 0..W 1, j 1..D21 , i , j 0 , D 21
j Tdec
Z ,
Ts Te H 2
m
0 ,0 ,0 ,D21 Z
i 0
m 1
Tc Te H 20 ,0 ,0 ,D21 Z p 22 Z (55)
6.2 Average Service Time From equations (44) (respectively equation (55)), we derive the average service time of a category C1 station ( respectively a category C 2 station). The average service time of a category C i station is given by: (56) X i TS i1 1
(51) And:
Where TS i1 Z , is the derivate of the service time Z-transform of a category C i station [11].
H 2i ,i 1,0 ,D21 Z 1 p 22 ZH 2i 1,i ,0 ,D21 Z
1 p 22 ZH 2i ,i ,0 ,D21 Z
Ts Tc 22 Te T D21 22 1 Psuc Z Z p 22 Psuc Z e Tdec i 2..W 2
(52)
H 2W 1,W 2 ,0 ,D21 Z
1 p 22 ZH 2W 1,W 1,0 ,D21 Z
By considering the same configuration as in figure 3, we depict in figure 5, the average service time of category C1 and category C 2 stations as a function of W . As for the throughput analysis, stations belonging to the same traffic category have nearly the same average service value. Simulation service time values coincide with analytical values given by equation (56). These results confirm the fairness of DM in serving stations of the same category.
(53) Ts Tc 22 Te Te D21 22 1 Psuc Z p 22 Psuc Z Tdec Z
According to figure 2 and using equation (51), we have: D21 Z H 20 ,0 ,0 ,D21 Z H 20 ,1,0 ,D21 Z Tdec
1 p 22 ZH 21,1,0 ,D21 Z
(54) Tc Ts 22 Te Te D21 22 1 Psuc Z p 22 Psuc Z Tdec Z
Therefore, we can derive an expression of S 2 Z-transform service time as follows:
UbiCC Journal - Volume 3
Figure 6: Average service time as a function of the contention window size D 21 1, n 8 In figure 7, we show that category C1 stations obtain better average service time than the one obtained with 802.11 protocol. Whereas, the opposite scenario happens for category C 2 stations
-
24
i
independently of n , the number of contending stations in the network.
exceeds 0.01s equals 0.2%. Whereas, station S 2 service time exceeds 0.01s with the probability 57,6%. Thus, DM offers better service time guarantees for the stations with the highest priority. In figure 9, we double the size of the contention window size and set it to 64. We notice that category C1 and category C 2 stations service time curves become closer. Indeed, when W becomes large, the BAB values increase and the DMSB becomes negligible compared to the basic backoff. The whole backoff values of S 1 and S 2 become closer and their service time accordingly.
Figure 7: Average service time as a function of the number of contending stations 6.3 Service Time Distribution Service time distribution is obtained by inverting the service time Z transforms given by equations (44) and (55). But we are most interested in probabilistic service time bounds derived by inverting the complementary service time Z transform given by [11]:
1 TS i Z ~ X i Z 1 Z
(56)
Figure 9: Complementary service time distribution for different values of D21 ( W 64 ) In figure 10, we depict the complementary service time distribution for both category C1 and
In figure 8, we depict analytical and simulation values of the complementary service time distribution of a category C1 and a category C 2
category C 2 stations and for different values of n , the number of contending nodes.
Figure 8: Complementary service time distribution for different values of D21 , W 32
Figure 10: Complementary service time distribution for different values of the contending stations
stations for different values of D21 and W 32 .
All the curves drop gradually to 0 as the delay increases. Category C1 stations curves drop to 0 faster than category C 2 curves. Indeed, when
D21 4 slots, the probability that S 1 service time
UbiCC Journal - Volume 3
Analytical and simulation results show that complementary service time curves drop faster when the number of contending stations is small for both category C1 and category C 2 stations. This means that all stations service time increases as the
-
25
by different traffic categories stations as a function of the minimum contention window size CW min
number of contending nodes increases. 7
EXTENTIONS OF THE ANAYTICAL RESULTS BY SIMULATION
The mathematical analysis undertaken above showed that DM performs service differentiation over 802.11 protocol and offers better QoS guarantees for highest priority stations Nevertheless, the analysis was restricted to two traffic categories. In this section, we first generalize the results by simulation for different traffic categories. Then, we consider a simple multi-hop and evaluate the performance of the DM policy when the stations belong to different broadcast regions.
such as CW min is always smaller than CW max ,
CW max 1024 and K =1. Analytical and simulation results show that throughput values increase with stations priorities. Indeed, the station with the lowest delay bound has the maximum throughput. Moreover, figure 12 shows that stations belonging to the same traffic category have the same throughput. For instance, when n is set to 15 (i.e. m 3 ), the three stations each traffic category have almost the same throughput.
7.1 Extension of the analytical results In this section, we consider n stations contending for the channel in the same broadcast region. The n stations belong to 5 traffic categories where n 5m and m is the number of stations of the same traffic category. A traffic category C i is characterized
Dij Di D j
by
a
delay
bound
Di ,
and
is the difference between the
deadline values of category C i and category C j stations. We have:
Dij i j K
(57)
Where K is the deadline multiplicity factor and is given by: (58) Di 1,i Di 1 Di K Indeed, when K varies, the Dij the difference between deadline values of category C i category
Cj
stations
also
varies.
and
Stations
belonging to the traffic category C i are numbered
Figure 12: Normalized throughput: different stations belonging to the same traffic category In figure 13, we depict the average service time of the different traffic categories stations as a function of K , the deadline multiplicity factor. We notice that the highest priority station average service time decreases as the deadline multiplicity factor increases. Whereas, the lowest priority station average service time increases with K .
from S i 1 to S im .
Figure 11: Normalized throughput for different traffic category stations In figure 11, we depict the throughput achieved
UbiCC Journal - Volume 3
Figure 13: Average service time as a function of the deadline multiplicity factor K In the same way, the probabilistic service time bounds offered to S 11 (the highest priority station)
-
26
are better than those offered to station S 51 (the lowest priority station). Indeed, the probability that S 11 service time exceeds 0.01s=0.3%. But, station
S 51 service time exceeds 0.01s with the probability of 36%.
Flows packets are routed using the Ad-hoc On Demand (AODV) protocol. Flows F1 and F2 are respectively transmitted by stations S 1 and S 2 with
delay
bounds
and and D1 D2 D 21 D 2 D1 =5 slots. Flows F3 and F4 are transmitted respectively by S 3 and S 4 and have the same delay bound. Finally, F5 and F6 are transmitted respectively by S 5 and S 6 with delay bounds D5 and D6 and D65 D6 D5 = 4 slots. Figure 16 shows that the throughput achieved by F1 is smaller than the one achieved by F2 . Indeed, both flows cross nodes 6 and 7, where F1 got a higher priority to access the medium than F2 when the DM policy is used. We obtain the same results for flows F and F . Flows F3 and F4 5
Figure 14: Complementary distribution CWmin 32 , n 8
service
time
6
have almost the same throughput since they have equal deadlines.
The above results generalize the analytical model results and show once again that DM performs service differentiation over 802.11 and offer better guarantees in terms of throughput, average service time and probabilistic service time bounds for flows with short deadlines. 7.2 Simple Multi hop scenario In the above study, we considered that contending stations belong to the same broadcast region. In reality, stations may not be within one hop from each other. Thus a packet can go through several hops before reaching its destination. Hence, factors like routing protocols or interferences may preclude the DM policy from working correctly. In the following paragraph, we evaluate the performance of the DM policy in a multi-hop environment. Hence, we consider a 13 node simple mtlti-hop scenario described in figure 15. Six flows are transmitted over the network.
Figure 15: Simple multi hop scenario
UbiCC Journal - Volume 3
Figure 16: Normalized throughput using DM policy Figure 17 show that the complementary service time distribution curves drop to 0 faster for flow F1 than for flow F2 .
Figure 17: End to end complementary service time distribution
-
27
The same behavior is obtained for flow F5 and
F6 , where F5 has the shortest delay bound. Hence, we conclude that even in a multi-hop environment, the DM policy performs service differentiation over 802.11 and provides better QoS guarantees for flows with short deadlines. 8
CONCLUSION
In this paper we proposed to support the DM policy over 802.11 protocol. Therefore, we used a distributed backoff scheduling algorithm and introduced a new medium access backoff policy. Then we proposed a Markov Chain based mathematical model to evaluate the performance of the DM policy in terms of throughput , average medium access delay and medium access delay distribution. Analytical and simulation results showed that DM performs service differentiation over 802.11 and offers better guarantees in terms of throughput, average service time and probabilistic service time bounds for flows with small deadlines. Moreover, DM achieves fairness between stations belonging to the same traffic category. Then, we extended by simulation the analytical results obtained for two traffic categories to different traffic categories. Simulation results showed that even if contending stations belong to K traffic categories, K 2 , the DM policy offers better QoS guarantees for highest priority stations. Finally, we considered a simple multi-hop scenario and concluded that factors like routing messages or interferences don’t impact the behavior of the DM policy and DM still provides better QoS guarantees for stations with short deadlines. 9
REFERENCES
G. Bianchi: Performance Analysis of the IEEE 802.11 Distributed Coordination Function, IEEE J-SAC Vol. 18 N. 3, (March 2000). H. Wu1, Y. Peng, K. Long, S. Cheng, J. Ma: Performance of Reliable Transport Protocol over IEEE 802.11 Wireless LAN: Analysis and Enhancement, In Proceedings of the IEEE INFOCOM’02, June 2002. H. Zhai, Y. Kwon, Y., Fang: Performance Analysis of IEEE 802.11 MAC protocol in wireless LANs, Wireless Computer and Mobile Computing, (2004). I. Aad and C. Castelluccia: Differentiation mechanisms for IEEE 802.11, In Proc. of IEEE Infocom 2001, (April 2001). IEEE 802.11 WG: Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer
UbiCC Journal - Volume 3
(PHY) specification, IEEE (1999). IEEE 802.11 WG: Draft Supplement to Part 11: Wireless Medium Access Control (MAC) and physical layer (PHY) specifications: Medium Access Control (MAC) Enhancements for Quality of Service (QoS), IEEE 802.11e/D13.0, (January 2005). J. Deng, R. S. Chang: A priority Scheme for IEEE 802.11 DCF Access Method, IEICE Transactions in Communications, vol. 82-B, no. 1, (January 1999). J.L. Sobrinho, A.S. Krishnakumar: Real-time traffic over the IEEE 802.11 medium access control layer, Bell Labs Technical Journal, pp. 172187, (1996). J. Y. T. Leung, J. Whitehead: On the Complexity of Fixed-Priority Scheduling of Periodic, RealTime Tasks, Performance Evaluation (Netherlands), pp. 237-250, (1982). K. Duffy, D. Malone, D. J. Leith: Modeling the 802.11 Distributed Coordination Function in Non-saturated Conditions, IEEE/ACM Transactions on Networking (TON), Vol. 15 , pp. 159-172 (February 2007) L. Kleinrock: Queuing Systems,Vol. 1: Theory, Wiley Interscience, 1976. P. Chatzimisios, V. Vitsas, A. C. Boucouvalas: Throughput and delay analysis of IEEE 802.11 protocol, in Proceedings of 2002 IEEE 5th International Workshop on Networked Appliances, (2002). P.E. Engelstad, O.N. Osterbo: Delay and Throughput Analysis of IEEE 802.11e EDCA with Starvation Prediction, In proceedings of the The IEEE Conference on Local Computer Networks , LCN’05 (2005). P.E. Engelstad, O.N. Osterbo: Queueing Delay Analysis of 802.11e EDCA, Proceedings of The Third Annual Conference on Wireless On demand Network Systems and Services (WONS 2006), France, (January 2006). P.E. Engelstad, O.N. Osterbo: The Delay Distribution of IEEE 802.11e EDCA and 802.11 DCF, in the proceeding of 25th IEEE International Performance Computing and Communications Conference (IPCCC’06), (April 2006), USA. The network simulator ns-2, http://www.isi.edu/nsnam/ns/. Y. Xiao: Performance analysis of IEEE 802.11e EDCF under saturation conditions, Proceedings of International Conference on Communication (ICC’04), Paris, France, (June 2004). V. Kanodia, C. Li: Distribted Priority Scheduling and Medium Access in Ad-hoc Networks”, ACM Wireless Networks, Volume 8, (November 2002).
-
28
TEMPORAL INFORMATION SYSTEMS AND THEIR APPLICATIONS TO MOBILE AD HOC ROUTING V. Mary Anita Rajam, V.Uma Maheswari and Arul Siromoney Department of Computer Science and Engineering, Anna University, Chennai -- 600 025, India Contact email:
[email protected]
ABSTRACT A Temporal Information System, that incorporates temporal information into the traditional Information System of Rough Set Theory (RST) and Variable Precision Rough Sets (VPRS), is presented. Mobile Ad hoc Networks (MANETs) dynamically form a network without an existing infrastructure. The Dynamic Source Routing (DSR) protocol of MANETs is modified in this paper to use recent routes. Weighted elementary sets are introduced in temporal information systems and used to route packets in mobile ad hoc networks. Notions from VPRS are also brought into weighted temporal information systems and used in routing. The performance of these proposed routing protocols is studied. Keywords: Rough Set Theory, Mobile ad hoc networks, Temporal Information System, Routing.
1
INTRODUCTION
Rough Set Theory is a mathematical tool that deals with vagueness and uncertainty. Mobile ad hoc networks are a collection of wireless mobile nodes that can dynamically form a network. The state of each link in a mobile ad hoc network changes with time. This paper introduces temporal information systems that are then applied to mobile ad hoc routing. Each mobile node maintains a route cache of known routes. It is shown in this paper that giving more importance to the recent routes in the route cache is useful. This has led to the notion of weighted elementary sets in temporal information systems, where more recent elementary sets are given more importance. 1.1 Rough Set Theory In Rough Set Theory (RST) [21], introduced by Zdzislaw Pawlak, a data set is represented as a table, where each row represents an event or an object or an example or an entity or an element. Each column represents an attribute that can be measured for an element. This table is called an information system. The set of all elements is known as the universe. For example, if the information system describes a hospital, the elements may be patients; the attributes (condition attributes) may be
UbiCC Journal - Volume 3
symptoms and tests; and the decisions (or decision attribute) may be diseases. In an information system, elements that have the same value for each attribute are indiscernible and are called elementary sets. Subsets of the universe with the same value of the decision attribute are called concepts. A positive element is an element of the universe that belongs to the concept. For each concept, the greatest union of elementary sets contained in the concept is called the lower approximation of the concept and the least union of elementary sets containing the concept is called the upper approximation of the concept. The set containing the elements from the upper approximation of the concept that are not members of the lower approximation is called the boundary region. The lower approximation of the concept is also known as the positive region. A set is said to be rough if the boundary region is non-empty. A set is said to be crisp if the boundary region is empty. Variable Precision Rough Sets (VPRS) [31], proposed by Ziarko, is a generalization of the rough set model, aimed at modelling classification problems involving uncertain or imprecise information. Classification with a controlled degree of uncertainty is possible with this model. It is possible to generalize conclusions obtained from a smaller set of observations to a larger population.
29
In RST, the lower approximation of a concept is defined using an inclusion relation. Here in VPRS, the lower approximation is defined using a majority inclusion relation. The β-positive region is the union of elementary sets which are either completely contained in the concept or are almost contained in the concept, with a maximum error of 1 – β. The conditional probability of an element being positive in an elementary set is the probability that the element is positive, given that the element belongs to that elementary set. It is the ratio of the number of positive elements in that elementary set to the number of elements in that elementary set. When this conditional probability is greater than a threshold β (0.5 < β ≤ 1) the elementary set is said to fall in the β-positive region. 1.2 Rough Sets in Temporal Contexts A temporal system is a time based system which shows the temporal variation of some specific data or attribute. Time-series data are a kind of temporal data and are results of some observations usually ordered in time. Time-series data often possess content that is conflicting and redundant. The data may be imprecise; therefore a precise understanding of information cannot be derived from the data. Rough set theory offers a powerful toolset for confronting this situation. Analysis of time-series data and constructing suitable data from the time-series that can be used by rough sets are investigated in [15], [18], [11]. Reducts can be found from the original data and rules can be generated from the acquired reducts using rough sets [15]. For constructing suitable data from the time-series, different methods are tried. In the mobile window method [4], a window is moved along the time-series; the data points falling into the window are transferred into a rough sets object. The window method poses restrictions on how far back in time dependencies can be traced. In the columnizing method [18], the time series are organized in columns, such that each row represents an object where each column is an economic indicator, and each row represents a different point in time. In [11], time-series is represented using a series of events or states. An event is something that occurs, and is associated with a time. The values of attributes are trends, rather than values measured at points in time, so that dependencies can be traced back as long as required. A decision table for time-series called a time series decision table, is proposed in [16]. A time
UbiCC Journal - Volume 3
stamp is introduced as an attribute, in addition to the condition attributes and the decision attribute of the traditional information system. A temporal information system is introduced by Bjorvand [29]. The temporal information system has a sequence attribute in addition to the condition attributes and the decision attribute present in the traditional information system. The value of the sequence attribute is an integer, based on the time of occurrence of the objects in the information system. Methods are proposed in [29] and [28] to convert this temporal information system into the traditional information system, so that rough set techniques can be applied. The method proposed in [29] depends on time intervals that must be fixed and defined in advance. Trend expressions [28] are added to transform the traditional method of translating a temporal information system into an information system. So, a new attribute is added to the information system, for which values are set based on the trend. A real time temporal information system is proposed in [29], in which the difference between the time of occurrence of the current row and the previous row (if rows are sorted according to time) is also stored in the information table. Having a linearly ordered universe, based on time, is also used in bringing in the notion of time into a rough set information system. This temporal information system with a linearly ordered universe [19], [8], [1], [3], [20], [2] provides information about the behaviour of objects in time and state of an object is described by some attributes. The elements can be the behaviour over time of the same object, multiple objects, or objects independent of each other. Temporal templates [19], which are homogeneous patterns occurring in some periods, are extracted from temporal information systems. The temporal templates are then used to discover behaviour of temporal features of objects. Using temporal templates [20], a temporal multiple information system [19] is introduced that describes many objects along the time axis e.g. several users visiting a website. For each object, a sequence of temporal templates is found. Collections of patterns, called episodes, appearing in sequences frequently are found. New attributes are generated from the found frequent episodes. Temporal patterns that can be potentially specific for one musical instrument or a group of instruments are searched [8]. Such patterns are used as new descriptors. A time window based technique is used to measure the values of the descriptors in
30
time. Optimal temporal templates that respond to temporal patterns are then determined. From the temporal templates, episodes (collections of templates that occur together) are found. Information maps can be constructed from data represented by temporal information systems [1]. The temporal information system at the current time t is also viewed as a family of decision systems [1], where the universe of the decision system at time t1, is a subset of the decision system at time t2, for t1 < t2. Some work is based on how the data varied over a particular duration of time (say, two years, two months etc.). Different attributes are assigned for each time period in [30]. A dynamic information system model based on time sequence is proposed in [13]. The attributes in the information system can have different values at different time points. 1.3 Routing in Mobile Ad Hoc Networks Sending data from a source mobile node to a destination mobile node through a route (or sequence of intermediate mobile nodes) is routing. Routing is one of the most difficult issues in mobile ad hoc networks. Each node in an ad hoc network is responsible for routing. Hence each node should maintain the information necessary for routing, that is, the next hop or the path through which the data has to be routed to the destination. This information is either available even before it is needed (proactive) or is got only when necessary (reactive).
1.3 Application of Rough Sets to Computer Networks Very little work has been done in the application of rough set theory to Mobile Ad Hoc Networks and Mobile Ad Hoc Routing. A few papers have applied rough set theory to networks. Rough set theory is used in intrusion detection in computer networks [6], [14] and [32]. Rough set approach is also applied to the flow control of a UDP-based file transfer protocol [33] to accomplish a real time datatransferring rate adjustment task; to network fault diagnosis [24]; and to achieve the rules for object recognition and classification for mobile robots [12]. Rough set logics is combined with artificial neural networks for failure domain exploration of telecommunication network [7]. The basic concepts of rough set theory are illustrated using an example concerning churn modeling in telecommunications [23]. 1.4 Overview of the Paper This paper presents a Temporal Information System that brings temporal information into the traditional Information System of Rough Set Theory and Variable Precision Rough Sets. The DSR protocol is modified to study the use of recent routes. The paper then introduces the notion of weighted elementary sets in temporal information systems and uses this to route packets in mobile ad hoc networks. This paper then uses VPRS in weighted temporal information systems and uses this in routing. The performance of these proposed routing protocols are studied. 2
Proactive routing is usually done using table driven protocols where tables are formed initially, and are updated either periodically or when some change is known. Reactive routing protocols are also known as on-demand routing protocols. In this kind of protocols, a route is found for a destination only when there is a need to send information to the destination. Each node does not have knowledge of the location of all other nodes in the network. All nodes when they learn or use routes, store the routes to the destinations in a routing table or a cache. Most of the on-demand routing protocols [5], [25], [9], discover routes when needed by flooding route request packets in the network. A route reply is sent back to the source node by either an intermediate node that has a route to the destination or by the destination.
TEMPORAL INFORMATION SYSTEMS
2.1 Information Systems and Decision Systems Consider a universe U of elements. An information system I is defined as I = (U, A, V, ρ), where A is a non-empty, finite set of attributes; is the set of attribute values of all attributes, where is the set of possible values of attribute ; is an information function, such that for every element , is the value of attribute for element . The information system can also be viewed as an information table, where each element corresponds to a row, and each attribute corresponds to a column. I = (U, A, V, ρ), is known as a decision system, when an attribute is specified as the
UbiCC Journal - Volume 3
31
decision attribute. A decision system is used for predicting the value of the decision attribute. is known as the set of condition attributes. The concept is the set of elements of that have a particular value (say, ) of the decision attribute . That is, Normally, is a boolean attribute that takes one of two possible values. When is known as a multi-valued decision attribute. These definitions are based on the definition of Rough Set Information System in [21, 22, 17]. 2.2 Regions of the Universe An equivalence relation , called indiscernibility relation, is defined on the universe as
elementary sets whose conditional probability is greater than or equal to where 0.5. The negative region is the union of the elementary sets whose conditional probability is less than where 0.5. These are based on the definitions in [34]. When , we denote it as , and note that . The range of is (0.5,1] in the original VPRS definition. This indicates probabilistically that the elementary set is positive, when the decision attribute is boolean. It appears that when the decision attribute is multi-valued with as the number of possible values, the range of is (1/k,1]. 2.3 Temporal Extensions A Generic Temporal Information System is defined as a set of information tables, with each information table located at a time on the time axis. A time interval can also be considered instead of a time instance. (GTIS)
In the information system , the elementary set containing the element with respect to the indiscernibility relation , is
The lower approximation of the concept , with respect to and equivalence relation on , is the union of the elementary sets of with respect to that are contained in , and is denoted as The upper approximation of is the union of the elementary sets of with respect to that have a non-zero intersection with , and is denoted as The lower approximation of is also known as the Positive region of . The set is called the Boundary region of . The set is called the Negative region of . The conditional probability that an element in an elementary set is positive is
The conditional probability that the element in the elementary set is negative is
When the context is clear, the conditional probability of an elementary set is taken to be . The
A special case is when , that is, the same universe of elements that appears in each information table. A particular element , for each attribute , would have a value at each time . For example, patient $x$ has fever at , and does not have fever at . This special case is when each . In this paper, this is treated as a single Temporal Information System defined as , where is the time attribute with as a set of pairs with as a sequence of time instances,
For each elementary set that is formed from the set of attributes (ie the set of attributes without the time information), there are now elementary sets, where the first elementary set consists of elements that occurred between time instance and , the second elementary set of elements between and , and the last elementary set of elements between and . This can be pictured as vertical blocks of elementary sets along a time axis. 2.4 Information System in a Mobile Node The use of an information system in mobile ad hoc routing was introduced in [26]. The information system was modified in [27] to represent the route better, by using the link information rather than the node information. A threshold was used in the identification of a good next hop.
-positive region is the union of the
UbiCC Journal - Volume 3
32
Let be a set of mobile nodes. A route is a path through mobile nodes in and is denoted as a sequence of mobile nodes Each mobile node maintains a route cache that stores all the routes that knows. Any route in the route cache is a path starting from that mobile node , and so , the first node in the route, is itself. Any route in the route cache is a simple path, where no node repeats, that is, for in the path, . So, . Each mobile node has an information table associated with it. Each row in the information table corresponds to a route in the route cache maintained by that mobile node . Let be the set of all possible links between the nodes. Each condition attribute corresponds to a particular link in the set of all possible links between the nodes. So, is the same for any and is denoted as . Each condition attribute is a boolean attribute, with , and is set to 1 or 0 depending on whether or not that link is present in the route corresponding to that element. So, since for every , and is denoted as since it is the same in each mobile node . A mobile node knows a route either because the route is in a packet that passes through this mobile node, or because this mobile node is in promiscuous mode, and this route is in a packet that passes between two nodes that are within range of this mobile node. When a mobile node knows a route it is added to the route cache only if it is not identical to a path or a sub-path of any other route already present in the route cache. However, every time a mobile node knows a route, a row corresponding to this route is always added to its information table. Consider an element x corresponding to a route . When a row is added to the information table, the values of the condition attributes corresponding to the links are set as 1. 2.5 Decision System in a Mobile Node In traditional Rough Set Theory and VPRS, the value of the decision attribute of a new element (or unknown element or test case) is predicted, based on which elementary set it falls into. The elementary set into which it falls is determined by the values of the attributes of that element. The decision system in a mobile node is used to predict the next hop for a particular destination. This next hop is called the predicted next
UbiCC Journal - Volume 3
hop. The decision attribute is taken as the next hop, and the predicted next hop is also known as the predicted value of the decision attribute. The destination can possibly be reached through several different sequences of intermediary nodes. In other words, several different combinations of attribute values make it possible for the destination to be reached. That is, elements in several different elementary sets correspond to routes that lead to this particular destination. Thus several elementary sets play a role in identifying the best next hop for a particular destination. So, it is not possible to use a single elementary set to predict the value of the decision attribute. The union of these elementary sets is used. In other words, for a particular destination, the union is taken of all the elementary sets that correspond to valid routes from the current mobile node to the destination. A stringent method of predicting the next hop is when all the elements in this union of elementary sets have the same value of the decision attribute, then this value is taken as the predicted next hop. In other words, all known routes to this destination should have this particular node as the next hop. This can also be considered as that value of the decision attribute for which all these elementary sets are in its lower approximation. It is to be remembered that the decision attribute is a multivalued attribute, and so the lower approximation is with respect to a value of the decision attribute. Another method is to have the predicted next hop as that value of the decision attribute where the union of these elementary sets is in the -positive region. The conditional probability is determined using the union of elementary sets, and not a single elementary set. The probability that a particular next hop occurs given that the route leads to a particular destination is taken as the conditional probability. This conditional probability should be greater than a threshold . In other words, a large number of known routes to this destination have this particular node as the next hop. 2.6 Temporal Decision System in a Mobile Node In a Temporal Information System (TIS) for a mobile node, each element (corresponding to a route) has a particular value of the time attribute, that is, each element falls in a particular time interval. This is determined by the time stamp of the next hop of the route that corresponds to this element. The Temporal Decision System (TDS) in a mobile node is used to predict the next hop. An appropriate method (as described in the previous section) is used to determine the predicted
33
next hop in each time interval. The predicted next hop for the TDS is then determined based on the number of time intervals in which it is the predicted next hop. The predicted value of the decision attribute is determined from the TDS based on the probability of a particular value of the decision attribute being the predicted value in the different time intervals. This probability is the number of time intervals in which that value of the decision attribute is the predicted value divided by the total number of time intervals. The predicted value of the decision attribute is the value for which this probability is greater than a threshold . In other words, that particular next hop has been the predicted next hop in most of the time intervals. In Weighted Temporal Information Systems (WTIS) and Weighted Temporal Decision Systems (WTDS), weights are assigned to the elementary sets between time instances and , the elementary sets between and , and the elementary sets between and , respectively. The predicted value of the decision attribute is determined after associating weights with the time intervals. The predicted value of the decision attribute is that value of the decision attribute where the probability is greater than a threshold . The probability is the sum of the weights of time intervals in which that value of the decision attribute is the predicted next hop divided by the sum of the weights of all the time intervals. When the more recent time intervals play a more important role, the weight of a more recent time interval is higher than the weight of a less recent time interval. 3
MOBILE AD HOC RECENT ROUTES
ROUTING USING
This section first describes the original Dynamic Source Routing (DSR) protocol, an ondemand routing protocol. The DSR protocol is then modified so that the most recent applicable route in the route cache is used at each intermediate node. The performance of the modified protocol is evaluated. 3.1 Dynamic Source Routing In Dynamic Source Routing (DSR) [5], each mobile node has a route cache to store the routes that are known to that mobile node. The source node is the node that wants to send a data packet. It uses the shortest route present in its route cache to the destination. If there is no
UbiCC Journal - Volume 3
route in the route cache, it initiates a route discovery, and gets back a route reply with the route to the destination. This source route is placed in the data packet and the data packet is sent to the next hop in the route. When a data packet reaches an intermediate node, the source route in the data packet is used to forward the data packet to the next hop. If a node while sending the data packet to the next hop, finds that the link does not exist, it uses the shortest route present in its route cache to that destination. If the route is not found in the route cache, route discovery is done. When a route discovery is required, the node broadcasts route request packets. If any intermediate node receiving the route request has a route to the required destination in its route cache, it sends a route reply to the initiator of the route discovery. Else, the node appends its own address to the route in the route request and re-broadcasts the route request. If the route request reaches the destination, the destination reverses the route and sends back a route reply to the initiator of the route discovery. When a path is to be added to the cache, the following are done. If a prefix of the path to be added is present in the cache, the rest of this path is appended to the path present in the cache. If the whole of the new path to be added is not present in the cache, the path is added to the cache. If there is no free space in the cache, then a victim is picked up and the route to be added is put in the victim's place. If there is any path or a subpath in the cache entry that is a prefix or the same as that of the path that is added, then the time stamps of those links are set to be equal to the time stamps of the links in the path that is added. When a link error occurs, a route error with information about the dead link is sent to the original sender of the data packet. Paths, or subpaths starting with the given dead link, are removed from the cache in nodes that receive the route error. 3.2
DSRrecent
This section describes the proposed DSRrecent protocol and the modifications made to the existing DSR protocol. In the source node, in DSR, the shortest route in the route cache is used, whereas in DSRrecent, the route to the destination in the route cache, that has the most recent next hop, is used (using algorithm findRecentRoute).
34
If no route is found (to the destination) in the route cache of the intermediate node, or if the found route will result in a loop, the data packet is forwarded according to the existing route in the data packet. If a link error occurs in any node, the best route found in the route cache (using algorithm findRecentRoute) is used. If there is no route in the route cache, route discovery is done.
findRecentRoute() { besttime = 0; bestroute = NULL; foreach possible-route do t = time of next hop in possible-route; if t > besttime then besttime = t; bestroute = possible-route; end end return bestroute; }
3.3
Performance Evaluation
(iv) Average hop count: The average number of hops from the source node to the destination node. The network simulator ns2 [10] is used for the experiments. The following parameters are ones that have been often used in such studies. The random waypoint mobility model is used in a rectangular field. Constant bit rate traffic sources are used. The radio model in the simulator is based on the Lucent Technologies WaveLAN 802.11. providing a 2Mbps transmission rate. A transmission range of 250 m is used. The link layer modeled is the Distributed Coordination Function (DCF) of the IEEE 802.11 wireless LAN standard. The sourcedestination pairs (connections) are spread randomly over the network. 512 byte data packets are used. Nodes move in a field with dimensions 1500 m X 300 m with a maximum speed of 2 m/sec. The pause time is 20 seconds. The number of nodes is kept fixed as 50. The number of communicating source-destination pairs is varied from 5 to 40. Simulations are run for 1000 simulated seconds.
Packet Delivery Ratio (%)
In intermediate nodes, in DSR, the source route is used to determine the next hop. Only if there is a link error, the route cache is used. The shortest route to the destination in the route cache is then placed in the data packet, instead of the original source route. However, in DSRrecent, the best route is determined from the route cache (using algorithm findRecentRoute). The route in the data packet is then modified such that the route from the route cache replaces the subpath (from the current node) in the route in the data packet.
120 100
(iii) Average end-to-end delay: The average delay from when a packet is sent by the source node until it is received by the destination node.
UbiCC Journal - Volume 3
DSRrec
40 20 0 10
20
30
40
Number of Connections
Figure 1: Packet delivery ratio vs. number of connections for DSR and DSRrecent
Normalized control overhead
(ii) Normalized control overhead : The ratio of the number of control packets sent to the number of data packets received in the application layer.
DSR
60
5
The performance of DSRrecent is evaluated using the following metrics, that are normally used in such studies: (i) Packet delivery ratio: The ratio of the data packets delivered to the application layer of the destination to those sent by the application layer of the source node.
80
6 5 4
DSR
3 DSRrec
2 1 0 5
10
20
30
40
Number of connections
Figure 2: Normalized control overhead vs. number of connections for DSR and DSRrecent
35
Average hop count
The packet delivery ratios for DSR and DSRrecent are very similar for 5 and 10 connections. When the number of connections is increased from 20 to 40 there is an improvement in the packet delivery ratio from 3% to 11% (Fig. 1). The normalized control overhead for DSRrecent is more than that for DSR when the number of sources is 5. With increase in the number of connections from 10 to 40, there is an improvement of 4% to 25% over DSR (Fig. 2). In average hop length and average end-to-end delay, DSRrecent is seen to perform worse than DSR as the number of connections is increased (Fig. 3, Fig. 4).
18 16 14 12 10 8 6 4 2 0
The use of the WTIS to predict the value of the decision attribute has already been described in section 2.6. This uses the predicted value of the decision attribute in different time intervals. The experiment described here uses a simple approach to determine the predicted value of the decision attribute in a particular time interval. A predicted value of the decision attribute in a time interval has atleast one element, with that value of the decision attribute, in the union of elementary sets. That is, the union is in the upper approximation for that value of the decision attribute.
4.1 Routing Based on WTIS DSR DSRrec
5
10
20
30
40
Number of connections
Figure 3: Average hop count vs. number of connections for DSR and DSRrecent
Average end-to-end delay
more recent time intervals are assigned higher weights than the less recent time intervals.
10 9 8 7 6 5 4 3 2 1 0
DSR DSRrec
5
10
20
30
40
Number of Connections
Figure 4: Average end-to-end delay vs. number of connections for DSR and DSRrecent
Here, the route cache of the mobile node is used as the WTIS. Routes that are learnt and used are added to the cache of the mobile node. When routes are added, the time stamp of each link is added along with the routes. However, unlike DSR, even if the same route is present in the cache earlier, the new route is added with the new stamp stamps. So, the cache now has the same route multiple times, but with different time stamps. In the source node, initially, as in DSR, a shortest route in the route cache, if available, is placed as the source route in the data packet. If not available, route discovery is done. Then in the source node, and in any intermediate forwarding node, the WTIS is used to determine the best next hop (using algorithm findWeightBasedHop). If the next hop is found, and does not result in a loop, the data packet will be forwarded to this next hop. If this next hop is different from the one in the source route that is already in the data packet, this new next hop is appended to the source route in the data packet at the current node and the route is invalidated by setting a flag in the data packet.
MOBILE AD HOC ROUTING USING WEIGHTED TEMPORAL INFORMATION SYSTEMS (WTIS)
If a next hop cannot be determined from the WTIS, or if the next hop results in a loop, if the source route in the data packet has not been invalidated earlier, the data packet is forwarded according to the source route. Else, a route discovery is done.
In Temporal Information Systems, each elementary set is associated with a particular time interval. In Weighted Temporal Information Systems, elementary sets in different time intervals have weights. Since it is seen in the previous section that the recent route in the route cache is useful,
The total time is divided into time intervals. The list of next hops to the destination that are present in the route cache is found. For each possible next hop, from the current time interval till the initial time interval, a weighted sum of the number of times that the particular next hop is used is found. More
4
UbiCC Journal - Volume 3
36
The ratio of the weighted sum of the usage of the node to the total weight is found. The node for which the ratio is greater than a threshold $\beta'$ is chosen as the next hop.
120 100 80
The proposed protocol used in this section is referred to as TIME_WT. The packet delivery ratios for DSR and TIME_WT are nearly similar for 5 and 10 connections. When the number of connections is increased from 20 to 40 there is a slight improvement in the packet delivery ratio from 5% to 7% (Fig. 5). The normalized control overhead for TIME_WT is more than that for DSR when the number of sources is 5 and 10. With increase in the number of connections from 20 to 40, there is an average improvement of 14% to 22% over DSR (Fig.6).
UbiCC Journal - Volume 3
TIME_WT
40 20 0 10
20
30
40
Number of Connections
Figure 5: Packet delivery ratio vs. number of connections for DSR and TIME_WT
6 5 4
DSR
3 TIME_WT
2 1 0 5
10
20
30
40
Number of connections
Performance Evaluation
The parameters used are the same as those given in section 3.3. The size of the time interval is taken as 40 seconds. The value of $\beta'$ used is 0.5.
DSR
60
5
Figure 6: Normalized control overhead vs. number of connections for DSR and TIME_WT
Average hop count
4.2
Packet Delivery Ratio (%)
findWeightBasedHop(){ Find all possible next hops that will lead to the destination from this node; foreach possible next hop nh timeInterval = currentInterval; weightedSum = 0; weight = maxWeight; totalWeight = 0; while timeInterval >= 0 do if nh is used as a nexthop in timeInterval then weightedSum = weightedSum + weight; end timeInterval = timeInterval -1; //previous timeInterval totalWeight = totalWeight + weight; weight = weight -1; end ratio[nh] = weightedSum / totalWeight; end Find the nexthop nh for which the value of ratio is the maximum and return }
The average hop length and the average end-to-end delay for TIME_WT are more than that for DSR when the number of sources is 5 and 10. But when the number of connections is increased from 20 to 40, it is seen that there is a slight improvement of about 2% in average hop length and of about 5\% in average end-to-end delay over DSR (Fig.7, Fig.8).
Normalized control overhead
weight is assigned if the next hop has been used in the recent past. That is, the weights assigned decrease for earlier time intervals.
16 14 12 10 8 6 4 2 0
DSR TIME_WT
5
10
20
30
40
Number of connections
Figure 7: Average hop count vs. number of connections for DSR and TIME_WT
37
DSR TIME_WT
5.2 5
10
20
30
40
Number of Connections
Figure 8: Average end-to-end delay vs. number of connections for DSR and TIME_WT
5
5.1
The routing protocol is similar to that of the previous section. The next hop is chosen using the notion of threshold $\beta$ ($\beta$--positive regions) as described in algorithm findVPRSWeightBasedHop().
MOBILE AD HOC ROUTING USING BETA-POSITIVE REGIONS IN WTIS
Performance Evaluation
The parameters used are the same as that given in section 3.3. The size of the time interval is taken as 40 seconds. The value of $\beta$, $\beta'$ used are 0.6,0.5 respectively. The proposed protocol used in this section is referred to as VPRS_WT. The packet delivery ratio for VPRS_WT is less than that for DSR for 5 and 10 connections. When the number of connections is increased from 20 to 40 there is a slight improvement in the packet delivery ratio from 2% to 6% (Fig. 9).
Routing Based on Beta-Positive Regions
The experiment described in this section determines the predicted next hop as that value of the decision attribute where the union of these elementary sets is in the $\beta$-positive region, as described in section 2.5. findVPRSWeightBasedHop(){ Find all possible next hops that will lead to the destination from this node; Foreach possible next hop nh do TimeInterval = currentInterval - 1; weightedSum = 0; weight = maxWeight; totalWeight = 0; while timeInterval >= currentInterval – k do nhopCount = the number of routes with next hop nh and willlead to the destination in timeInterval ; totalCount = the number of routes that will lead to the destination in timeInterval ratio1 = nhopCount/totalCount if ratio1 > $\beta$ then weightedSum = weightedSum + weight; end timeInterval = timeInterval -1; //previous timeInterval totalWeight = totalWeight + weight; weight = weight -1; end ratio[nh] = weightedSum / totalWeight; end Find the nexthop nh for which the value of ratio is greater than $\beta'$ }
UbiCC Journal - Volume 3
The normalized control overhead for VPRS_WT is more than that for DSR when the number of sources is 5, 10. With increase in the number of connections from 20 to 40, there is an average improvement of 14% to 19% over DSR (Fig. 10). The average hop length and the average end-to-end delay for VPRS_WT is more than that for DSR when the number of sources is 5, 10. But when the number of connections is increased from 20 to 40, it is seen that there is a slight improvement of about 4% in average hop length over DSR (Fig. 11). The average end-to-end delay for VPRS_WT is similar to that of DSR when the number of sources is 5. But when the number of connections is increased from 10 to 40, it is seen that there is an improvement of about 6% in average hop length over DSR (Fig. 12).
Packet Delivery Ratio(%)
Average end-to-end delay
7 6 5 4 3 2 1 0
120 100 80
DSR
60 VPRS_WT
40 20 0 5
10
20
30
40
Number of Connections
Figure 9: Packet delivery ratio vs. number of connections for DSR and VPRS_WT
38
Normalized control overhead
6 5 4
DSR
3 VPRS_WT
2 1 0 5
10
20
30
40
Number of connections
Average hop count
Figure 10: Normalized control overhead vs. number of connections for DSR and VPRS_WT
16 14 12 10 8 6 4 2 0
DSR VPRS_WT
5
10
20
30
40
Number of connections
Average end-to-end delay
Figure 11: Average hop count vs. number of connections for DSR and VPRS_WT
7 6 5 4 3 2 1 0
DSR VPRS_WT
5
10
20
30
40
Number of Connections
Figure 12: Average end-to-end delay vs. number of connections for DSR and VPRS_WT
6
CONCLUSIONS
This paper presents temporal extensions to Rough Set Theory and Variable Precision Rough Sets. These extensions are applied to Mobile Ad hoc routing. Illustrative experiments are described and the results are presented.
UbiCC Journal - Volume 3
Using recent routes (DSRrecent) was found to improve packet delivery ratio and normalized control overhead. Temporal information was brought into information systems. Recent elementary sets were given more importance in the two proposed methods, TIME_WT and VPRS_WT. The VPRS_WT method uses notions from VPRS. It was seen that the control overhead is much better, while the packet delivery ratio, average hop length and average end-to-end delay are slightly better than that of DSR. It was also seen that the improvement in performance increases with the number of connections. 7
REFERENCES
[1] A. Skowron and P. Synak. Patterns in Information Maps. In Rough Sets and Current Trends in Computing, volume 2475 of Lecture Notes in Artificial Intelligence, pages 453 – 460. Springer, 002. [2] A. Skowron and P. Synak. Reasoning Based on Information Changes in Information Maps. In Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, volume 2639 of Lecture Notes in Artificial Intelligence, pages 229 – 236. Springer 2003. [3] A. Wieczorkowska, J. Wroblewski, P. Synak and D. Slezak. Application of temporal descriptors to musical sound recognition. Journal of Intelligent Information Systems, 21(1):71 – 93, 2003. [4] J.K. Baltzersen. An attempt to predict stock market data: a rough sets approach. Master’s thesis. 1996. [5] David B.Johnson and David A. Maltz. Dynamic source routing in ad hoc wireless networks. In Imielinski and Korth, editors, Mobile Computing, volume 353. Kluwer Academic Publishers, 1996. [6] Zhongmin Cai, Xiaohong Guan, Ping Shaoa, and Guoji Sun. A rough set theory based method for anomaly intrusion detection in computer network systems. In Expert Systems, volume 20, pages 251 – 259, 2003. [7] Frank Chiang and Robin Braun. Intelligent failure domain prediction in complex telecommunication networks with hybrid rough sets and adaptive neural nets. In 3rd International Information and Telecommunication Technologies Symposium, 2004. [8] Slezak D., Synak P., Wieczorkowska A., and Wroblewski J. Kdd-based approach to musical instrument sound recognition. In Proc. of the 13th International Symposium on Foundations of Intelligent Systems, Vol. 2366, Lecture Notes in Artificial Intelligence, Springer, pages 28 – 36, 2002. [9] Rohit Dube, Cynthia D. Rais, Kuang-Yeh Wang, and Satish K. Tripathi. Signal stability-based adaptive routing (SSA) for ad hoc mobile networks. IEEE Personal Communications, 4(1):36{45, 1997. [10] K. Fall and K. Varadhan. The ns manual
39
(formerly ns notes and documentation), 2002. http://www.isi.edu/nsnam/ns/doc/index.html. [11] Roed G. Knowledge extraction from process data: A rough set approach to data mining on time series. In Mater's thesis, 1999. [12] Wang Haijun and Chen Yimin. Sensor data fusion using rough set for mobile robots system. In Proceedings of the 2nd IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications, pages 1 – 5, 2006. [13] Xiaowei He, Liming Xu, and Wenzhong Shen. Dynamic information system and its rough set model based on time sequence. In Proc. of 2006 IEEE International Conference on Granular Computing, pages 542 – 545, 2006. [14] Peng Hong, Dongna Zhang, and Tiefeng Wu. An intrusion detection method based on rough set and svm algorithm. In 2004 International Conference on Communications, Circuits and Systems, ICCCAS 2004, pages 1127 – 1130 vol. 2. [15] Herbert J. and Yao J. T. Time-series data analysis with rough sets. In Proc. of 4th International Conference on Computational Intelligence in Economics and Finance, pages 908 – 911, 2005. [16] Li J., Xia G., and Shi X. Association rules mining from time series based on rough set. In Proc. of the Sixth International Conference on Intelligent Systems Design and Applications, pages 509 – 516, 2006. [17] J. Komorowski, Z. Pawlak, L. Polkowski, and A. Skowron. Rough sets: A tutorial. In S. K. Pal and A. Skowron, editors, Rough Fuzzy Hybridization: A New Trend in Decision-Making, pages 3-98. Springer-Verlag, 1999. [18] Shen L. and Loh H. T. Applying rough sets to market timing decisions. Decision Support Systems, Special Issue: Data Mining for Financial Decision Making, 37(4):583 – 597, 2004. [19] Synak P. Temporal templates and analysis of time related data. In Rough Sets and Current Trends in Computing, volume 2005 of Lecture Notes in Computer Science, pages 420 – 427. Springer, 2000. [20] Synak P. Temporal feature extraction from temporal information systems. In Ning Zhong, Zbigniew W. Ras, Shusaku Tsumoto, Einoshin Suzuki, editors, Foundations of Intelligent Systems, 14th International Symposium, ISMIS 2003, Vol. 2871, Lecture Notes in Computer Science, Springer, pages 270 – 278, 2003. [21] Z. Pawlak. Rough sets. International Journal of Computer and Information Sciences, 11(5):341– 356, 1982. [22] Z. Pawlak. Rough Sets — Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991. [23] Zdzislaw Pawlak. Rough set theory and its applications. In Journal of Telecommunications and Information Technology, 2002. [24] Yuqing Peng, Gengqian Liu, Tao Lin, and
UbiCC Journal - Volume 3
Hengshan Geng. Application of rough set theory in network fault diagnosis. In Proceedings of the Third International Conference on Information Technology and Applications (ICITA 05), pages 556 – 559 vol. 2. [25] C. Perkins and E. Royer. Ad-hoc on demand distance-vector routing for mobile computers. In Proceedings of the Second international workshop on Mobile Computing Systems and applications, pages 90 - 100, 1999. [26] V. Mary Anita Rajam, V. Uma Maheswari, and Arul Siromoney. Mobile ad hoc routing using rough set theory. In 2006 International Conference on Hybrid Information Technology -Vol2 (ICHIT'06), pages 80 - 83, November 2006. [27] V. Mary Anita Rajam, V. Uma Maheswari, and Arul Siromoney. Extensions in mobile ad hoc routing using variable precision rough sets. In IEEE International Conference on Granular Computing, pages 237 – 240, November 2007. [28] Lin S., Chen S., and Ning Z. Several techniques usable in translating a TIS into IS. 30(5), 2003. [29] Bjorvand A. T. Mining time series using rough sets -a case study. In Komorowski H. J. and Zytkow J. M., editors, Principles of Data Mining and Knowledge Discovery, Vol. 1263, Lecture Notes in Computer Science, Springer, pages 351 - 358, 1997. [30] Kowalczyk W. and Slisser F. Analyzing customer retention with rough data models. In Komorowski H. J. and Zytkow J. M., editors, Principles of Data Mining and Knowledge Discovery, Vol. 1263, Lecture Notes in Computer Science, Springer, pages 4 – 13, 1997. [31] W.Ziarko. Variable precision rough set model. Journal of Computer and Systems Sciences, 46(1):39 – 59, 1993. [32] Wang Xuren, He Famei, and Xu Rongsheng. Modeling intrusion detection system by discovering association rule in rough set theory framework. In Proceedings of the International Conference on Computational Intelligence for Modelling Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, page 24, 2006. [33] J. Zhang and R. D McLeod. A udp-based file transfer protocol (uftp) with flow control using a rough set approach. 2005. [34] W. Ziarko. Set approximation quality measures in the variable precision rough set model. In Proc. of 2nd Intl. Conference on Hybrid Intelligent Systems, Santiago, Chile, 2002.
40
INTEGRATION OF FUZZY INFERENCE ENGINE WITH RADIAL BASIS FUNCTION NEURAL NETWORK FOR SHORT TERM LOAD FORECASTING Ajay Shekhar Pandey , S.K.Sinha Kamla Nehru Institute of Technology ,Sultanpur, UP, INDIA
[email protected] ,
[email protected] D. Singh Institute of Technology,Banaras Hindu University,Varanasi, UP, INDIA
ABSTRACT This paper proposes a fuzzy inference based neural network for the forecasting of short term loads. The forecasting model is the integration of fuzzy inference engine and the neural network, known as Fuzzy Inference Neural Network (FINN). A FINN initially creates a rule base from existing historical load data. The parameters of the rule base are then tuned through a training process, so that the output of the FINN adequately matches the available historical load data. Results show that the FINN can forecast future loads with an accuracy comparable to that of neural networks, while its training is much faster than that of neural networks. Simulation results indicate that hybrid fuzzy neural network is one of the best candidates for the analysis and forecasting of electricity demand. Radial Basis Function Neural Network (RBFNN) integrated with Fuzzy Inference Engine has been used to create a Short Term Load Forecasting model. Keywords: STLF, RBFNN ,Fuzzy Inference, Fuzzy Inference Neural Networks. 1
INTRODUCTION
Short term forecasts in particular have become increasingly important since the rise of the competitive market. Forecasting the power demand is an important task in power utility companies because accurate load forecasting results in an economic, reliable and secure power system operation and planning. Short Term Load Forecasting (STLF) is important for optimum operation planning of power generation facilities, as it affects both system reliability and fuel consumption. The complex dependence of load on human behaviour, social and special events & various environmental factors make load forecasting a tedious job. It is an important function performed by utilities for planning operation and control and is primarily used for economic load dispatch, daily operation and control, system security and assurance of reliable power supply. The impacts of globalization and deregulation demands improved quality at competitive prices, which is the reason why development of advanced tools and methods for planning, analysis, operation and control are needed. Important decisions depend on load forecast with lead times of minutes to months. The ability of ANN to outperform the traditional STLF methods, especially during rapidly changing weather
UbiCC Journal - Volume 3
conditions and the short time required for their development, have made ANN based STLF models a very attractive alternative for on line implementation in energy control centers. In this era of competitive power market, it is of main concern that how to improve accuracy of STLF. In recent years use of intelligent techniques have increased noticeably. ANN and fuzzy systems are two powerful tools that can be used in prediction and modeling. Load forecasting techniques such as ANN [4], [5], [6], [7], [11], [15] , [18], Expert systems [14], fuzzy logic, fuzzy inference [2], [3], [10], [12], [13], [16] have been developed, showing more accurate and acceptable results as compared to conventional methods. A wide variety of conventional models for STLF have also been reported in the literature. They are based on various statistical methods such as regression [1], Box Jenkins models [9] and exponential smoothing [19]. Conventional ANN model based STLF have several drawbacks, such as long training time and slow convergent speed. The RBF model is a very simple and yet intrinsically powerfully network, which is widely used in many fields because of its extensive learning and highly computing speed [6],[7]. A neuro-fuzzy approach has been applied successfully in a price sensitive environment [2]. Soft Computing (SC) introduced by Lotfi Zadeh [20] is an innovative approach to
41
construct computationally intelligent hybrid systems consisting of Artificial Neural Network (ANN), Fuzzy Logic (FL), approximate reasoning and optimization methods. Fuzzy system is another research area which is receiving increased attention. The pioneering work of Zadeh in fuzzy set theory has inspired work in many research areas with excellent results. A fuzzy expert system for STLF is developed in [15]. It uses fuzzy set theory to model imprecision in the load temperature model and temperature forecasts as well as operator’s heuristic rules. Fuzzy set theory proposed by Zadeh [20] provides a general way to deal with uncertainty, and express the subjective knowledge about a process in the form of linguistic IF-THEN rules. Fuzzy Systems exhibit complementary characteristics, offering a very powerful framework for approximate reasoning as it attempts to model the human reasoning process at a cognitive level. It acquires knowledge from domain experts and this is encoded within the algorithm in terms of the set of If-Then rules. Fuzzy systems employ this rule based approach and interpolative reasoning to respond to new inputs. Fuzzy systems are suitable for dealing with problems caused by uncertainty, inexactitude and noise, so the uniting of fuzzy system and neural networks can exert respective advantages. In this paper, a fuzzy inference neural network is presented to improve the performance of STLF in electric power systems. A Fuzzy Inference Neural Network initially creates a fuzzy rule base from existing historical load data. The parameters of the rule base are then tuned through a training process so that the output of the network adequately matches the available historical load data. The fuzzy system combines the fuzzy inference principles with neural network structure and the learning ability into an integrated neural network based fuzzy decision system. Combining the specific characteristic that the variety of power systems load is non-linear, we set up a new short-term load forecasting model based on fuzzy neural networks and fuzzy getting smaller inference algorithm. The flexibility of the fuzzy logic approach, offering a logical set of IF-THEN rules, which could be easily understood by an operator, might be a good solution for easy practical implementation and usage of STLF models. The hybrid FNN approach is finally used to forecast loads with greater accuracy than the conventional approaches when used on a stand- alone mode. 2
RADIAL BASIS NETWORK
FUNCTION
NEURAL
Radial Basis Function (RBF) Network consists of two layers, a hidden layer with nonlinear neurons and an output layer with linear neurons. Thus, the transformation from the input space to the hidden
UbiCC Journal - Volume 3
unit space is non-linear whereas the transformation from the hidden unit space to the output space is linear. The basis functions in the hidden layer produce a localized response to the input i.e. each hidden unit has a localized receptive field. RBFNNs exhibit a good approximation and learning ability and are easier to train and generally converge very fast. It uses a linear transfer function for the output units and Gaussian function (radial basis function) for the hidden units. The transform function of hidden layer is a non-negative and nonlinear function. In RBF neural network, three parameters are needed to study: the center and the variance of the basis function and the weight connecting hidden layer to the output layer. The RBF network has many study methods according to the different methods of selecting the center. In this paper, a method of the self-organizing study selecting RBF center is adopted. The method consists of two-step procedure: the first one is selforganizing study, which is to study the basis function center and variance; then the next step is supervisory study, which is the weight connecting hidden layer to the output layer. A RBF neural network embodies both the features of an unsupervised learning based classification and a supervised learning layer. The network is mainly a feed forward neural network. The hidden unit consists of a function called the radial basis function, which is similar to the Gaussian Density function whose output is given by ⎛ (x r ⎜ jp o = exp − ⎜ ∑ i ⎜ j = 1 ⎝
− W
σ
ij
) 2 ⎞⎟ (1) ⎟ ⎟ ⎠
where, Wij = Center of the i th RBF unit for input variable j
σ
= Spread of the RBF unit
= j th variable of the input pattern The RBF neural network generalizes on the basis of pattern matching. The different patterns are stored in a network in form of cluster centers of the neurons of the hidden units. The number of neuron, determines the number of cluster centers that are stored in the network. The response of particular hidden layer node is maximum (i.e. 1) when the incoming pattern matches the cluster center of the neuron perfectly and the response decays monotonically as the input patterns mismatches the cluster center; the rate of decay can be small or large depending on the value of the spread. Neurons with large spread will generalize more, as it will be giving same responses (closer to 1) even for the wide variation in the input pattern and the cluster centers whereas a small spread will reduce the generalization property and work as a memory. Therefore, spread is x
42
an important parameter and depends on the nature of input pattern space. The output linear layer simply acts as an optimal combiner of the hidden layer neuron responses. The weights ‘w’ for this layer are found by multiple linear regression technique. The output of the linear layer is given by
mp
=
∑
w
mi
oi + bi
(2)
i =1
where, N = number of hidden layer nodes (RBF units) y mp = output val ue of the m th node in the output layer for the i th incoming pattern
w mi = weight betweeni th RBF unitandmth outputnode bl
= biasing strength of the m th output node
oi = i th input to the linear layer.
The values of the different parameters of the RBF networks are determined during training. These parameters are spread, cluster centers, and weights and biases of the linear layer. The number of neurons for the network and spread is determined through experimentation with a large number of combinations of spread and number of neuron. The best combination is one which produces minimum Sum Squared Error (SSE) on the testing data. 3
observed non-fuzzy input space U ⊆ R to the fuzzy sets defined in U. Hence, the fuzzification interface provides a link between the non-fuzzy outside world and the fuzzy system framework. The fuzzy rule base is a set of linguistic rules or conditional statements in the form of: "IF a set of conditions is satisfied, THEN a set of consequences are inferred". The fuzzy inference engine is a decision making logic performing the inference operations of the fuzzy rules. Based on the fuzzy IFTHEN rules in the fuzzy rule base and the compositional rule of inference [14], the appropriate fuzzy sets are inferred in the output space. Supposing the mapping µ A from discussed n
N
y
range. There are two types of fuzzy models. The first kind is known as Mamdani model [8]. In this model, both fuzzy premise part and consequence part are represented in linguistic terms. The other kind is Takagi-Sugeno model [17] that uses linguistic term only for the fuzzy premise part. In this paper the Takagi-Sugeno reasoning method is used. The fuzzification interface is a mapping from the
FUZZY INFERENCE
Fuzzy inference is the process of formulating the mapping from a given input to the output using fuzzy logic. This process numerically evaluates the information embedded in the fuzzy rule base. The fuzzy rule base consists of “IF-THEN” type rules. For a set of input variables, there will be fuzzy membership in several fuzzy input variables. By using the fuzzy inference mechanism, the information is processed to evaluate the actual value from the fuzzy rule base. A good precision can be achieved by applying appropriate membership definitions along with well-defined membership functions. This is an information processing system that draws conclusions based on given conditions or evidences. A fuzzy inference engine is an inference engine using fuzzy variables. Fuzzy inference refers to a fuzzy IF-THEN structure. The fact that fuzzy inference engines evaluates all the rules simultaneously and do not search for matching antecedents on a decision tree makes them perfect candidates for parallel processing computers. A fuzzy set is a set without a crisp, clearly defined boundary, and can contain fuzzy variables with a partial degree of membership, which is presented by the membership functions within the
UbiCC Journal - Volume 3
region U to the range [0, 1]: U → [0,1] ,
x → µ A (x) confirms a fuzzy subset of U, named A, the mapping µ A (x) is known as membership function of A. The size of the mapping µ A (x) shows the membership degree of x to fuzzy set A, which is called membership degree for short. In practice, membership function can be selected according to the characteristic of the object. Fuzzy inference based on fuzzy estimation is a method by which a new and approximate fuzzy estimation conclusion is inferred using fuzzy language rule. This paper adopts composite fuzzy inference method, which is inference method based on fuzzy relation composing principle. A fuzzy inference engine can process mixed data. Input data received from the external world is analyzed for its validity before it is propagated into a fuzzy inference engine. The capability of processing mixed data is based on the membership function concept by which all the input data are eventually transformed into the same unit before the inference computations. A fuzzy inference engine normally includes several antecedent fuzzy variables. If the number of antecedent variables is k then there will be k information collected from the external world. Fuzzification and normalization are the two typical transformations. Another important property is that when an input data set is partially ambiguous or unacceptable, a fuzzy inference engine may still produce reasonable answers. 4
FUZZY INFERENCE NEURAL NETWORK A fuzzy Inference neural network approach,
43
which combines the important features of ANN and fuzzy using inference mechanism is proposed .This architecture is suggested for realizing cascaded fuzzy inference system and neural network modules, which are used as building blocks for constructing a load forecasting system. The fuzzy membership values of load and temperature are the inputs to the ANN, and the output comprises the membership value of the predicted load. To deal with the linguistic values such as high, low, and medium, architecture of ANN that can handle fuzzy input vectors is propounded. Each input variable is converted into a fuzzy membership function in the range [0-1] that corresponds to the degree to which the input belongs to a linguistic class. RBFNN has been integrated with fuzzy inference to form a FINN for Short Term Load Forecasting. The RBFNN is used to extract the features of input and output variables. It is noteworthy that the input variables are extended to include a output variable and extract the relationship between inputs. 4.1 Input Variable Selection and Data Processing The most important work in building our Short Term Load Forecasting (STLF) models is the selection of the input variables. It mainly depends on experience and is carried out almost entirely by trial and error. However, some statistical analysis can be very helpful in determining the variables, which have significant influence on the system load. Normally more input neurons make the performance of the neural network worse in many circumstances. Optimal input parameters would result in a compact ANN with higher accuracy and also at the same time with good convergence speed. Parameters with effect on hourly load can be categorized into day type, historical load data and weather information.
Input Variables
Fuzzy Inference Engine
Temperature is the most effective weather information on hourly load. Data has been taken for Trans Alta Canada System. In order to make minimum inference case, the input load is sorted into 5 categories and labeled as low (L), low medium (LM), medium (M), medium high (MH) and high (H). The input temperature is also sorted into 5 categories same as above. Design data consists of hourly data, integrated load data and temperature of two places. Keeping in view the large geographical spread of the data , for which the utility supply, the hourly temperature of two places have been taken in the historical data. Firstly data are normalized. The n rows thus give for each group the value of m feature denoting the characteristics of these groups. In the present work features correspond to characterization of data model i.e. hrs., two hours before load, one hour before load, temp.1, temp.2 In this paper, fuzzy IF-THEN rules of the form suggested by Takagi- Sugeno [19] are employed, where fuzzy sets are involved only in the premise part of the rules while the consequent part is described by a non-fuzzy function of the input variables. The historical data is used to design data which are further fuzzified using IF-THEN rule. The data model involves the range of data low (L), low medium (LM), medium (M), medium high (MH) and high (H), five linguistic variables for each crisp data type. These five linguistic value are defined as L(3800 MW-4200 MW), LM (4280.001MW- 4760 MW), M(4760.001 MW5240 MW), MH (5240.001 MW -5720 MW) and H(5720.001 MW-6200 MW)and the linguistic values for temperature are as L (-370°C to -230 °C), LM (-229.999°Cto -90°C), M(-89.999°C to +50°C), MH (+50.001°C to +190 °C) and H (+190.001°C to
Neural Network
Radial Basis Function Neural Network
Output
Learning Algorithm
Figure 1: Forecasting Model
UbiCC Journal - Volume 3
44
+330°C), using IF-THEN rule. These data are normalized and fuzzified using inference engine as shown in demand table (Table-1). The five linguistic variables using IF-THEN rule for load as well as temperature are as follows. If P1 is low (L) and P2 is low (L) then α=LL If P1 is low (L) and P2 is low medium (LM) then α=LLM If P1 is low (L) and P2 is medium (LM) then α=LM If P1 is low (L) and P2 is medium high (MH) then α=LMH If P1 is low (L) and P2 is high (H) then α=LH If P1 is low medium (LM) and P2 is low (L) then α=LML If P1 is low medium (LM) and P2 is low medium (LM) then α=LMLM and so on. 4.2 Forecasting Model In FINN the RBFNN plays an important role to classify input data into some clusters while the fuzzy inference engine handles the extraction of rules. Fig. 1 shows the structure of FINN that has two layers; input/output and rule layers. The input/output layer has input and output node. The input nodes of the input/output layer are connected to neurons on the topological map of the rule layer. The fuzzy membership neural networks are assigned to the weight between the input nodes and rule layer. Also, the consequent constant is assigned between the output node and rule layer. The parameter selection method can be considered as a rule base initialization process. Essentially, it performs a fuzzification of the selected input points within the premise space. The mean values of the memberships are centered directly at these points, while the membership deviations reflect the degree of fuzzification and are selected in such a way that a prescribed degree of overlapping exists between successive memberships. The fact that the initial parameters of the FINN are not randomly chosen as in neural networks but are assigned reasonable values with physical meaning gives the training of an FNN a drastic speed advantage over neural networks. With fusing the strongpoint of fuzzy logic and neural networks, a fuzzy inference neural networks model, which effectively makes use of their advantages, has been developed. The training patterns for the ANN models are obtained from the historical loads by classifying the load patterns according to the day-types of the special days and linearly scaling the load values. The block diagram of the proposed system and the flow chart of the forecasting process are shown in the Fig.1. and Fig.2. 5
Data Set
Input Data Set (Load1, Load2, Temp1, Temp2, Hr-Load)
Making of Rule Base
Categorization and Distribution of Data Set
Data Conversion Normalization of Data
Crisp set of data Fuzzification (Fuzzified input data)
Training and Testing through RBFNN Forecasting Actual Data
Mean Absolute Percentage Error Figure 2: Flow chart of Forecasting Process Table 1: Demand table
SIMULATION RESULTS
The most widely used index for testing the performance of forecasters is the MAPE. The
UbiCC Journal - Volume 3
45
Table 2: Forecast errors in MAPE on seasonal transition weeks
Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday Average
Winter January 25-31 Day Week Ahead Ahead 2.5711 2.5711 1.6763 1.5041 2.0342 2.0527 2.4767 2.6438 2.9492 1.9225 2.4953 2.3185 2.7416 2.8998 2.4206 2.2732
Spring May 17-23 Day Week Ahead Ahead 1.9990 1.9990 1.8121 1.8797 2.0369 1.9750 2.2687 2.0208 1.8399 1.8356 2.4913 2.3826 2.6638 2.6110 2.1588 2.1005
Summer July 19-25 Day Week Ahead Ahead 2.2050 2.2050 2.0467 1.9221 2.4277 1.9505 1.5584 1.5206 1.5065 1.5079 1.9120 1.9915 1.6234 1.5122 1.8971 1.8014
Average Day Ahead 2.2584 1.8450 2.1663 2.1013 2.0985 2.2995 2.3429 2.1588
Week Ahead 2.2584 1.7686 1.9927 2.0617 1.7553 2.2309 2.3410 2.0584
Table 3: Comparison with MLR and simple RBFNN
Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday Average
Winter January 25-31 MLR RBFNN FINN 2.3863 1.0776 2.5711 1.6070 1.0727 1.5041 2.2656 1.1105 2.0527 1.8675 0.7494 2.6438 1.6801 1.1171 1.9225 2.8921 1.6459 2.3185 2.3560 1.5838 2.8998 2.3228 1.1939 2.2732
MLR 2.7664 2.8966 3.3757 2.3315 2.9397 1.0263 2.2336 2.5100
Spring May 17-23 RBFNN 1.0856 0.7082 0.9606 2.2876 1.1114 0.7726 1.7412 1.2310
FINN 1.9990 1.8797 1.9750 2.0208 1.8356 2.3826 2.6110 2.1005
Summer July 19-25 RBFNN 1.2466 2.2017 0.8057 1.2365 0.9062 1.0312 1.1475 1.2246
MLR 2.8015 2.2284 2.6688 3.0628 2.6345 2.4133 2.1984 2.5725
FINN 2.2050 1.9221 1.9505 1.5206 1.5079 1.9915 1.5122 1.8014
6000 A c tual Forec as ted
5800 5600
L o a d (M W )
5400 5200 5000 4800 4600 4400 4200
0
20
40
60
80 Hour
100
120
140
160
Figure 3: Forecast for Winter (January 25-31) 5600 Actual Forecasted
5400
L o a d (M W )
5200 5000 4800 4600 4400 4200 4000
0
20
40
60
80 Hour
100
120
140
160
Figure 4: Forecast for Summer (July 19-25)
UbiCC Journal - Volume 3
46
5600 Actual Forecasted
5400 5200
L o a d (M W )
5000 4800 4600 4400 4200 4000 3800
0
20
40
60
80 Hour
100
120
140
160
Figure 5: Forecast for Spring (May 17-23) designed network is used to forecast the day ahead and week ahead forecast on an hourly basis. Forecasting has been done on the one year load data of Trans Alta Electric Utility for Alberta, Canada. Load varies from 3900 MW to 6200MW. The FINN is trained using last four weeks hourly load data and then they are used to forecast the load for the next 168 hours i.e. one week. The results are reported for three weeks, one each for winter, spring and summer seasons. This reflects the behaviour of the network during seasonal changes and corresponding results are shown in Table 2. It is observed that the performance of the day ahead and week ahead forecast are equally good. Load shape curves for three weeks are shown in Fig. 3, Fig. 4 and Fig. 5.The errors are tabulated in Table 2. It is observed from the figures that the forecaster captures the load shape quite accurately and the forecasting error on most of the week days are low with slightly higher error on weekend days. For having a comparative study the proposed FINN method is compared with other two methods, conventional Multi Layer Regression and RBF neural networks for the same period of time. The result (Table 3) shows that the average MAPE for FINN is better than MLR in all seasons and the average MAPE for RBFNN is even better than FINN. But at the same time it is also noticeable that the training time required in the forecasting through RBFNN integrated with Fuzzy Inference is approximately ten times less than the training time required for simple RBFNN. 6
accurate as compared to MLR. The error depends on many factors such as homogeneity in data, network parameters, choice of model and the type of solution. The flexibility of the fuzzy logic offering a logical set of IF-THEN rules, which could be easily understood by an operator, will be a good solution for practical implementation. FINN training time was much faster and also effectively incorporated linguistic IF-THEN rules. Load forecasting results show that FINN is equally good for week ahead and day ahead forecasting and requires lesser training time as compared to other forecasting techniques, conventional regression MLR and simple RBF neural network. ACKNOWLEDGEMENT The authors would like to thank TransAlta, Alberta, Canada for providing the load data used in the studies. 7
REFERENCES
[1.]
A.D.Papalexopoulos, T.Hasterberg: A Regression based Approach to Short Term System Load Forecast , IEEE Trans. On Power Systems. Vol.5, No.4, pp 1535-1544, (1990).
[2.]
A. Khotanzad, E. Zhou and H.Elragal: A Neuro-Fuzzy approach to Short-term load forecasting in a price sensitive environment, IEEE Trans. Power Syst., vol. 17 no. 4, pp. 1273–1282, (2002).
[3.]
A. G. Bakirtzis, J. B. Theocharis, S. J. Kiartzis, and K. J. Satsios: Short-term load forecasting using fuzzy neural networks, IEEE Trans. Power Syst., vol. 10, pp. 1518–1524,(1995).
CONCLUSION
The benefit of the proposed structure is to utilize the advantages of both, i.e. the generalization capability of ANN and the ability of fuzzy inference for handling uncertain problems and formalizing the experience and knowledge of the forecasters. Load forecasting method proposed above is feasible and effective. A comparative study shows that FINN and RBFNN are more
UbiCC Journal - Volume 3
47
[4.]
C.N. Lu, H.T. Wu and S. Vemuri: Neural Network Based Short Term Load Forecasting , IEEE Transactions on Power Systems, Vol. 8, No 1, pp. 336-342, (1993). PAS-101, pp. 71-78. (1982)
[5.]
D.C.Park M.A.,El-Sharkawi, R.J.Marks, L.E.Atlas and M.J.Damborg: Electric Load Forecasting using an Artificial Neural Networks , IEEE Trans. on Power Systems, vol.6,No.2, pp. 442-449,(1991).
[6.]
D.K.Ranaweera, .F.Hubele and A.D.Papalexopoulos: Application of Radial Basis Function Neural Network Model for Short Term Load Forecasting , IEE Proc. Gener. Trans. Distrib., vol. 142, No.1, (1995).
Load Forecasting System Using Artificial Neural Networks and Fuzzy Expert Systems, IEEE Trans. on Power Systems, vol. 10, no. 3, pp. 1534–1539, ( 1995). [14.] K.L.Ho, Y.Y.Hsu, C.F.Chen, T.E.Lee, C.C.Liang, T.S.Lai and K.K.Chen : Short Term Load Forecasting of Taiwan Power System using a Knowledge Based Expert System, IEEE Trans.on Power Systems, vol.5, pp. 1214-1221, (1990). [15.] K.Y. Lee, Y.T. Cha, and J.H. Park: ShortTerm Load Forecasting Using An Artificial Neural Network,” IEEE Trans. on Power Systems, vol. 7, no. 1, pp. 124–132, (1992). [16.] Ranaweera D.K., Hubele N.F. and Karady G.G: Fuzzy logic for short-term load forecasting, Electrical Power and Energy Systems,” Vol. 18, No. 4, pp. 215-222, (1996).
[7.]
D. Singh and S.P. Singh: Self selecting neural network for short-term load forecasting , Jour. Of Electric Power. Component and Systems, vol. 29, pp 117130, (2001).
[8.]
E. H. Mamdani and S. Assilian: An experiment in linguistic synthesis with a fuzzy logic controller, Int. J. Man–Mach. Stud., vol. 7, no. 1, pp. 1–12, (1975).
[9.]
F. Meslier: New advances in short term load forecasting using Box and Jenkins approach , Paper A78 051-5, IEEUES Winter Meeting,( 1978).
[18.] T. S. Dillon, S. Sestito, and S. Leung: Short term load forecasting using an adaptive neural network, Elect. Power Energy Syst., vol. 13, (1991).
[10.] Hiroyuki Mori and Hidenori Kobayashi: Optimal fuzzy inference for short term load forecasting, IEEE Trans. on Power Systems, vol.11, No.2, pp. 390-396, (1996).
[19.] W.R.Christiaanse: Short Term Load Forecasting using general exponential smoothing, IEEE Trans. On Power Appar. Syst.,PAS-3,pp 900-911 (1988)
[11.] I. Mogram and S. Rahman : Analysis and evaluation of five short term load forecast techniques, IEEE Trans. On Power Systems. Vol.4, No.4, pp 1484-1491, (1989).
[20.] Zadeh L.A: Roles of Soft Computing and Fuzzy Logic in the Conception, Design and Deployment of Information /Intelligent Systems, Computational Intelligence, Soft Computing and Fuzzy-Neuro Integration with Applications, O Kaynak, LA Zadeh, B Turksen, IJ Rudas (Eds.), pp 1-9. (1998).
[12.] Kwang-Ho Kim, Hyoung-Sun Youn, YongCheol Kang: Short-tem Load Forecasting for Special Days in anomalous Load Conditions Using Neural Network and Fuzzy Inference Method, IEEE Trans. on Power Systems, Vol. 15, pp. 559-569, (2000).
[17.]
T. Takagi and M. Sugeno: Fuzzy identification of systems and its applications to modeling and control, IEEE Trans. Syst., Man, Cybern., vol. 15, pp. 116–132, (1985).
[13.] K.H. Kim, J.K. Park, K.J. Hwang, and S.H. Kim: Implementation of Hybrid Short-term
UbiCC Journal - Volume 3
48
EXPLORING PERFORMANCE LANDSCAPE OF UNSTRUCTURED SEARCH SCHEMES Hong Huang and Rajagopal Reddy Manda Klipsch School of Electrical and Computer Engineering, New Mexico State University, USA {hhuang, rgopal}@nmsu.edu
ABSTRACT Search plays an important role in ubiquitous computing. In this paper, we investigate the expected cost and latency of three unstructured search schemes: broadcast, TTL-based, and random-walk-based search schemes. We build a unified analytical model for unstructured search schemes. We demonstrate, through simulation, that the different search schemes exhibit very different cost and latency tradeoffs, leaving large gaps in the performance landscape. We propose randomized mixing schemes to bridge such performance gaps and bring out new Pareto frontier that offers more diverse performance choice for applications. Keywords: performance modeling, randomized mixing, search methods.
1
INTRODUCTION
Search for information in a network has many important applications in ubiquitous computing, such as route discovery of on-demand routing protocols in wireless ad hoc networks [1], event query in sensor nets [2][3], information lookup in peer-to-peer networks [4], etc. Unstructured search refers to a type of search where no a priori knowledge about the search target is available. Unstructured search is applicable in cases where the network is highly dynamic and maintaining an infrastructure for data lookup is too costly [4]. Unstructured search can be and sometimes is implemented by a broadcast search. However, a broadcast search is very costly and not scalable. A broadcast search is particularly unjustifiable when the target is replicated for robustness and latency reduction as is common with today’s distributed applications. As mentioned in [4], the fundamental problem of broadcast search lies at the lack of granularity of the search action, i.e., either no action or a very costly one--broadcast. To reduce cost and provide for finer granular search actions than broadcast search, a variety of other unstructured search schemes were proposed. Here we focus on two types of such schemes appearing in recent literature: iterative broadcast (TTL-based) schemes, and random-walk-based (RWbased) schemes. In a TTL-based scheme [5][6], a series of broadcasts with increasing scopes are carried out. The scope of a broadcast is determined by the TTL (Time to Live) value carried by a query packet, which limits the number of hops the packet can travel. A TTL-based scheme offers finer granular search actions than a simple broadcast by varying its scope; and promises to reduce cost by trying search actions with smaller cost first in the hope to find the
UbiCC Journal - Volume 3
target without high cost broadcast. However, a TTLbased scheme can be wasteful in overlapped coverage of successive broadcasts and generally causes larger search latency than simple broadcast. In a RW-based search, a query packet carries out a random walk in the network, which continues until the target is found [2][3]. A RW-based scheme has the finest granular search action possible, i.e., visiting a single node; but it can cause large latency. Variations on the basic random walk scheme are possible to reduce latency, e.g., using multiple walkers. There are two main performance metrics for a search scheme [6]: expected cost and expected latency. Expected cost is defined as the expected total number of hops traveled by the query packets generated a particular search scheme. Expected latency is defined as the expected time duration, in the unit of hops (i.e., one hop takes one unit of time), between initiation of the search and the discovery of the target. We do not include the time for the result to travel from the target to the originator, because it has nothing to do with the merits of a search scheme. Previous work most closely related to ours includes the following. TTL-based search on a line is first treated in [5] where an optimization problem is formulated and the competitive ratio to the optimal offline algorithm is obtained. In [6], a dynamical programming formulism for TTL-based scheme is developed, and a randomized strategy for selecting TTL values is shown to achieve the minimum worstcase cost if the target distribution is unknown. Such randomized strategy is pursued further in [7] to develop schemes that achieves lowest cost competitive ratio under constraint of varying delay competitive ratio. An analytical model for TTLbased search using generating function of degree
49
distribution is described in [15]. Random walk is shown to be an effective search method in sensor nets in [2], and its behavior is examined in [3][12]. Unstructured search in peer-to-peer networks is treated in [4] focusing on target replication strategy. Search in graphs with a power-law degree distribution is treated in [14]. Hybrid search schemes combining broadcast and random walk is discussed in [12]. The contributions of this paper are as follows. Although there are much previous works on unstructured search schemes, there is no systematic exploration of the performance landscape of all unstructured search schemes in one place, delineating the feasible regions of performance tradeoffs. This paper hopes to make progress in this direction: We build a unified analytical model for unstructured search schemes, which is parameterized by the granularities of the search sequences. We demonstrate through simulation that different unstructured search schemes exhibit very different cost and latency tradeoffs, leaving large gaps in the performance landscape. We propose randomized mixing schemes to bridge such performance gaps and bring out new Pareto frontier that offers more diverse performance choice for applications. The paper is organized as follows. In Section 2, we build a unified model for the three unstructured search schemes. In Sections 3-5, we deal with broadcast, TTL-based, RW-based search schemes, respectively. We introduce mixed schemes in Section 6, and conclude in Section 7. 2 A UNIFIED PERFORMANCE MODEL FOR UNSTRUCTURED SEARCH SCHEMES We consider the problem of searching for a target in a network of N nodes. A target is replicated in m copies, locating any of the replicas caused the search successful. We consider a search scheme consisting of a sequence of actions A = [A1, A2,…, Al], where Al is the first search action, A2 the second search action, and so forth, and Al is the terminating action in which the target is found. For an unstructured search scheme, there is no outside clue about the next search action except the history of previous search actions. In a broadcast search, there is a single action: broadcast. In a TTL-based scheme, each action is a limited broadcast with a particular TTL value. In a RW-based scheme, each action is a step in the walk. As we can see, the actions of different search schemes have different granularities, which have performance implications. We write Ci as the cost of performing search action Ai, Di as the average latency caused by Ai,Fi as the probability that Ai fails, with the convention F0 = 1, Fl = 0. The cost of the first search action is always paid outright, but that of a later action, Ci, is paid out only if the previous action failed with the probability
UbiCC Journal - Volume 3
of Fi-1. The search continues until success (i=l). So the expected cost of a search scheme can be written as l
E[C ] =
∑F
(1)
i −1Ci
i =1
It is easy to show the above can be rewritten as l ⎛ i ⎞ ⎜ E[C ] = C j ⎟ ( Fi −1 − Fi ) ⎜ ⎟ i =1 ⎝ j =1 ⎠
∑∑
(2)
The above expression can recognized as a standard formula for computing expected value. For each term in the summation, (Fi - Fi+1) is the probability that search sequence does not succeed in the (i-1)th action but succeeds in the ith action, and the summation in the parenthesis is cost of a search sequence terminating at ith action. We can use a similar approach to write the average latency of a search scheme but for a cautionary note. The latency incurred by Ai generally depends on whether the search is successful or not. Consider, for example, a search action with TTL set to be 10. If the target is within the TTL scope, say 7 hops away, the latency is 7. But if the target is more than 10 hops away, the search action fails and incurs a latency of 10 regardless where the target is located. We write D”i as the latency if the target is within the scope of ith action, i.e., if the action is successful; and D’i as the latency if the target is out of the scope, i.e., if the search action is unsuccessful, which is fixed for a particular action. So, with probability (Fi - Fi+1), i.e., the search sequence does not succeed in the (i-1)th action but succeeds in the ith action, the search latency is D '1 + D '2 + ... + D 'i −1 + D "i
Thus the expected latency can be written as ⎛
l
E[ D ] =
⎞
i −1
∑ ⎜⎜ ∑ D ' + D " ⎟⎟ ( F i =1
j
⎝
i
j =1
⎠
i −1
− Fi )
(3)
Rearranging terms, we have E[D] = E[ DBCast ] + E[ D ']
(4)
where l
E[ DBCast ] ≡
∑ D" (F i
i −1 − Fi )
(5)
i =1
⎛ ⎜ ⎜ i =1 ⎝ l
E[ D '] ≡
⎞ D ' j ⎟ ( Fi −1 − Fi ) = ⎟ j =1 ⎠
i −1
∑∑
l −1
∑F D' i
i
(6)
i =1
50
In the above, E[DBcast] expresses the expected latency for a search that expands in scope until successful without incurring the latency of failed actions. This part of latency is the same as the latency of a broadcast search, which is independent of the particular search scheme used and represents the minimum latency of any search scheme. E[D’] collects the latency incurred due to failed actions, and is dependant of the particular search scheme in question. The above formulism is general without regard to any particular search scheme in question. Before proceed further, we list our assumptions. A1 The cost of a search action (a single step) in a random walk is 1. A2 The cost of a search action that requires broadcasting to n nodes is n, i.e., Ci = Ni in equations (1) and (2). A3 The m target replicas are independently, identically distributed among the nodes. The above assumptions are admittedly idealistic, but they are used here to focus on the issue intrinsic to the merits of a particular search scheme and exclude external factors such as network conditions, implementation efficiency, etc. Assumption A1 holds only if the links are lossless, which is not true in practical situations. However, introducing loss requires specification of loss probability, which is some extraneous detail outside the scope of our discussion. Similarly, A2 holds only if the implementation of broadcast is perfect, again more idealism than realism. Broadcast is known to cause redundancy and inefficiency, to mitigate which methods have been devised [8]. Again, such details fall out of the scope of the present discussion, and similar approximation is used in [4]. Assumption A3 implies the probability of success depends only on the number of nodes visited, and has nothing to do with the identities of visited nodes. In other words, the failure probability Fi is solely a function of nodes visited so far, Ni, i.e., Fi = F ( Ni )
(7)
The particular form of F depends on the distribution of the replicas. Now we are ready for the following proposition. Proposition 2.1 A: The expected cost is parameterized by and thus depends only on the search sequence [N1, N2,…, Nl], regardless the network topology. B: The expected latency depends on both the search sequence and network topology. Proof: The validity of the statement A is apparent by combing equations (1) and (7). The validity of the statement A derives from the fact that different latencies are incurred in different network topologies to cover the same number of nodes. For, example, it incurs just one-hop latency to cover all
UbiCC Journal - Volume 3
nodes in the network if the topology is a complete graph; whereas it incurs n/2-hop latency if the topology is a ring of n nodes. Q.E.D. In the following, we provide bounds on minimum expected cost and latency, and identify search schemes that achieve these bounds. First, we define an idealized random walk. Definition 2.1 An idealized random walk is one that visits a distinctive node in every step. Proposition 2.2 The minimum expected cost among all unstructured search schemes is N
Cmin =
∑ F ( j − 1)
(8)
j =1
where F(j-1) is defined in the sense of (7). This minimum expected cost can be achieved by an idealized random walk. Proof: First, we show that for an arbitrary search scheme, its expected cost is at least Cmin. Suppose the scheme’s search action sequence is [A1, A2,…, Al], and the corresponding node coverage sequence being [N1, N2,…, Nl = N]. It is clear that Ci >= ΔNi =Ni - Ni1, since it takes at least a cost of ΔNi to cover ΔNi nodes. So we have l
E[C ] =
∑
l
Fi −1Ci ≥
i =1
∑
l
Fi −1ΔNi =
i =1
∑ F (N
i −1 ) ΔN i
i =1
The right-hand side of the inequality cab be recognized as the discrete integration of Fi with a step size of ΔNi on the N-axis, with support intervals as N1 + ΔN2 + ΔN2 +… + ΔNl = [1, 2, … N]. Since Fi is a non-increasing function of Ni, so we have l
∑
N
Fi −1ΔN i ≥
i =1
∑ F ( j − 1) = j =1
l
∑( F (N
i −1 ) +
F ( Ni −1 + 1) + ... + F ( Ni −1 + ΔNi − 1) )
i =1
Second, an idealized random walk is a sequential visit of each node. Without loss of generality, let the sequence be j=1, 2,…, N, and thus incurs the expect the minimum cost of N
∑ F ( j − 1)
Q.E.D.
j =1
The next proposition is obvious, and we state without proof. Proposition 2.2 The minimum expected latency is the minimum expected distance in hops between the originator and any one of target replicas, which can be achieved by a broadcast search.
51
In next three sections, we derive cost and latency models for three classes of unstructured search schemes: broadcast, TTL-based, and RW-based. To compute the expected cost and latency, one needs to have a specific probability distribution for replicas in the network. In the following, we assume replicas are distributed uniformly in the network, which is a common case, e.g., in DHT-based data lookup scheme in peer-to-peer networks [4]. Thus, given m replicas, the failure probability of a search action that covers Ni nodes of N nodes in the network is ⎛ N ⎞ Fi = ⎜1 − i ⎟ ⎝ N⎠
3
m
(9)
A broadcast search consists of a single action, i.e., broadcast. Its cost is fixed with E[CBCast] = N, however the expected latency is not so trivial. To make tractable analysis, we assume that a network lives in a d-dimensional space, where the originator resides in the center of the space. According to A3, the probability that a target resides in a particular space is proportional to the volume of the space. The probability that a target (at random distance x) is less than r hops away can be computed as d
Pr
⎡ ⎛ r ⎞d ⎤ ≡ P ( x ≤ r ) = 1 − ⎢1 − ⎜ ⎟ ⎥ ⎢⎣ ⎝ R ⎠ ⎥⎦
m
m
And according to Proposition 2.2, the expected latency of a broadcast search can be calculated as the expected minimum distance between originator and the targets.
E[ DBCast ] =
∫
R
0
m
rdPr = −
∫
L
0
⎡ ⎛ r ⎞d ⎤ rd ⎢1 − ⎜ ⎟ ⎥ ⎢⎣ ⎝ R ⎠ ⎥⎦
1 Γ(m)Γ( + 1) d = mR 1 Γ(m + + 1) d
m
(10)
where Γ( ) is the gamma function. For d = 2, which is our main focus here, we have N = απR2, with α being the node density. The node density, defined as the
UbiCC Journal - Volume 3
Γ(m) ≅ 2π m
N Γ(m) 3 2Γ ( m + ) 2
(11)
m−
1 2 e− m
⎛ ⎛ 1 ⎞⎞ ⎜1 + O ⎜ ⎟ ⎟ ⎝ m ⎠⎠ ⎝
We have, for arbitrary dimension and large m,
E[ DBCast ] ≅ Rm
4
where R is the radius of the network in hops. The probability that minimum distance between the originator and the m target replicas is no larger than r hops is m
E[ DBCast ] = m
In passing, we note that for large m [9],
BRAODCAST SEARCH
⎛r⎞ Pr ≡ P ( x ≤ r ) = ⎜ ⎟ ⎝R⎠
number of nodes per square hop, depends on the nodal degree and the network topology in question. Expressions of α can be obtained only for some special cases. For example, it is straightforward to show that α = 1 for a square grid with nodal degree of four, α = 2/√3 for hexagonal grid with nodal degree of six, etc. Since the density only introduces a scalar factor, we will, without loss of generality, use the value of one in the following. Thus, in a 2-D space, we have R = (N/π)1/2, and,
⎛1 1⎞ −⎜ + ⎟ ⎝ 2 d ⎠ Γ( 1
d
+ 1)
(12)
TTL-BASED SEARCH SCHEMES
A TTL-based search consists of a sequence of broadcasts with increasing TTL values, with the last action being a network-wide broadcast. To motivate our analysis, we consider the first two actions in a TTL-based search, i.e., A1 and ,A2 with increasing TTL values, covering N1 and N2 nodes, incurring costs of C1 and C2, respectively. Note that the node coverage of A2 includes that of A1. We consider two options: a) a sequence of search actions [A1, A2], or b) a single action A2. Clearly, these two options have the same probability of success at the conclusion of the search. However, their costs are different, i.e., Ca = C1 + F1 C2 for option a, and Cb = C2, for option b. Option a is preferable to b only if C1 + F1C2 < C2. Rearranging terms, we have the following lemma, which was similarly stated in [6] but is derived here using simpler argument. Lemma 4.1 The cost of a search action A2 can be reduced by using a sequence of search actions [A1, A2], if the inequality holds: F1 < 1 −
C1 C2
(13)
Specializing to our model, and under the assumptions A1-A3, we have C1 = N1, C2 = N2, and the inequality becomes
52
the condition below m
N ⎛ N ⎞ F1 = ⎜1 − 1 ⎟ < 1 − 1 N2 ⎝ N⎠
∂E[CTTL ] = − fi −1m + mpi +1 fi m −1 = 0 ∂fi
It holds only if m > 1, since N is the total number of nodes in the network and can not be less than N2. So we have: Corollary 4.1 If m = 1, the cost of a search action A2 is no more than that of a sequence of search actions [A1, A2]. We generalize Corollary 4.1 in the following proposition. Proposition 4.1 Under assumptions A1 and A3, and if the number of replicas is one, the cost of a broadcast search is no more than that of a TTL-based search. Proof: We use Corollary 4.1 recursively. Consider a TTL-based search scheme consists of a search sequence [A1, A2,…, Al]. Apply Corollary 4.1 to the subsequence [A1, A2], which can be replaced with a single action A2, resulting in a new search sequence [A2, A3,…, Al], which has a cost no more than that of the original sequence. Applying Corollary 4.1 again to the new sequence, and so forth, and eventually we can reduce the original sequence to a single action Al, i.e., a broadcast, with a cost no more than that of the original sequence. Q.E.D. The above proposition implies that a TTL-based search is competitive to a broadcast search only if m >1. We now proceed to calculate the expected cost and latency of a TTL-based scheme. In our model, a TTL-based search scheme consists of a sequence of actions, A1, A2, …Al, with Ci = Ni, and Cl = N, where Ni is the number of nodes covered in the ith round. Using (1), We calculate the expected cost as l
E[CTTL ] =
∑
l
Fi −1Ci =
i =1
l
=N
∑
∑ i =1
⎛ N ⎞ Ni ⎜ 1 − i −1 ⎟ N ⎠ ⎝
m
Equation (15) provides a recursive relationship that determines fi+2 given fi+1 and fi (note that pi = 1 - fi). The optimal solution is given by testing the values of the expected cost at fi’s given by the recursive equation and at the boundary fi = 0 (since fi <= 1, and fi’s are strictly decreasing after each search action). Exact solution of the above set of nonlinear equations is difficult. However, we can try different f1’s (note that f0 = 0) and determine a family of sequences. The family of sequences provides different cost and latency tradeoffs and includes one that has the minimum cost. We calculate the expected latency using (4) and (6). l −1
E[ DTTL ] = E[DBCast ] +
∑F D i
i
i =1
l −1
∑
l −1
Fi Di =
i =1
m
⎛ N ⎞ hi ⎜ 1 − i ⎟ = β N ⎠ ⎝ i =1
∑
l −1
∑
1
pi d fi m
i =1
where hi is the TTL value (hop count) for the ith action, which can be expressed as 1
hi = β pi d
where β being a constant determined by network topology. For 2-D space, which is our interest here, we have, noting that is node density is 1, π hi 2 = pi N , and thus β = π , and therefore
(14)
pi fi −1m
E[ DTTL ] = E[ DBCast ] +
1
l −1
∑ π
N i fi m
(16)
i =1
i =1
where pi ≡
(15)
Ni and fi ≡ 1 − pi . N
Clearly, the expected cost depends on the vectors {fi} or its complement {pi}, where fi indicates the proportion of nodes covered in the ith search action, and pi being its complement. We choose {pi} or {fi} to minimize the expected cost. Obviously, fi and pi, being probabilities, are constrained within the interval of [0, 1]. Thus, we have a constrained optimization problem. According to the KuhnTucker conditions for constrained optimization, the fi’s that minimize the expected cost fall into two cases: a) located at the constrain boundary, i.e., fi = 0 or 1, or b) located inside the interval and subject to
UbiCC Journal - Volume 3
Simulation results are shown in Figure 1 for a network with N = 1000, and m = 2, 4,.., 10. The network topology is generated randomly with average degree of five. The expected cost and latency as a function of p1 (the proportion of nodes covered in the first action) are plotted in the figure. From the figure one can see that the expected cost is small in the region of small p1, deceasing in a very gradual way to a minimum, and then rising quickly to become large at large p1, see Figure 1 (a). The implication is that the expected cost is insensitive to the selection of p1 as long as it is small. This turns out to be quite useful in practical implementation because there is no need to painstakingly search for
53
the optimal p1, any choice would be fine as long as it is a small number. 1000 m=2 m=4 m=6 m=8 m=10
900
800
700
E[C]
600
500
400
300
200
100
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
p1
(a) 20
18
m=2 m=4 m=6 m=8 m=10
16
m
s ⎞ ⎛ Fi = ⎜ 1 − i ⎟ = fi m N ⎝ ⎠
14
12 E[D]
fundamental problem for MRW is that there is no communication between walkers once they are released. So even if one walker finds the target, other walkers would continue until they individually find the target, incurring large cost. One way to rectify this problem is to make random walkers terminate probabilistically at each additional step. However, this leaves open the possibility that all the walkers might terminate before the target is found. Our solution is to make one persistent walker that never terminates, assisted by k non-persistent walkers, which survive with probability of q at each step. The persistent walker guarantees the search is always successful, while the non-persistent walkers speculatively wander away trying to reduce latency. In the following, we discuss expected cost and latency of SRW and MRW sequentially. We model a SWR search scheme as a sequence of actions, A1, A2, …Al, with Ci = 1, Di= 1, and l >= N. Let si indicate the number of distinct nodes visited in the first i steps, then probability of failure to find any of m replicas of the target is given by (since search for each replica has to fail), (17)
10
8
where fi = 1 −
6
si N
(18)
4
We calculate the expected cost using (1)
2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
p1
(b) Figure 1: Expected cost E[C] (a) and latency E[D] (b) versus p1 for TTL-based schemes.
l −1
l
E[CSRW ] =
∑
Fi −1Ci =
i =1
5
RW-BASED SEARCH SCHEMES
In a RW-based search, a query packet visits nodes sequentially and terminates when the target is found. There are a variety of ways to carry out a RW-based search. One way is to employ a single random walker (SRW). Such a search, if it is ideal, incurs minimal cost according to Proposition 2.2, but it also causes large latency. To lower latency, multiple random walkers (MRW) can be released simultaneously, but with large increase in cost. A
UbiCC Journal - Volume 3
i
m
(19)
i =0
We compute expected latency as l −1
l
The expected latency shows an opposite trend: it increases quickly to a maximum in the small p1 regime, then decreases monotonically and eventually saturates at large p1, see Figure 1 (b). Particularly, the maximum latency occurs roughly around the point where minimum cost occurs, indicating that reduced expected cost comes at the expense of increased latency.
∑f
E[ DSRW ] =
∑F
i −1 Di
i =1
=
∑f
i
m
(20)
i =0
The expected latency can be computed the same way as the cost since each step incurs a fixed latency (one unit), which is not true in other search schemes. A distinguishing fact for SRW is that the expected cost is the same as the expected latency. For other search schemes, expected cost is generally larger than the expected latency. This is because, for an action covering Ni nodes, the cost is paid outright as Ni, but the latency can be smaller than Ni if the target is found before every node is covered. Such inequality does not apply to SRW because each step is elementary, incurring unit cost and latency. There remain to be determined the values of si. There is no exact expression for si, though asymptotic expressions do exist [10]. One such expression says
54
where RR is probability that the walker will return to the originator. But such asymptotic expression is not useful. For the use of a RW-based search makes little sense if i gets as large as N, let alone infinity, because one might as well use broadcast with the same cost but much less latency. We think that, in a practical implementation, one will not use pure random walk simply for the sake of its theoretical purity, because it is patently inefficient, i.e., visiting previously visited nodes repeatedly. Instead, some optimization will be used in practical implementation, such as using a flag to indicate previous visits. Therefore, in a practical implementation, the behavior of a random walker approaches that of a idealized walker, i.e., si Æi, to the extent that optimization is done. We now turn to MRW. In MRW, in addition to one persistent random walker that terminates only if the target is found, k independent, identical, nonpersistent random walkers are also released, each having a probability of 1-q to terminate at each step. Since the random walkers are independent, we calculate the expected cost of MRW as
and are avoided unless all neighbors are flags, in which the walker marches on a flagged node regardless. The expected cost and latency as a function of q, the survival probability, are plotted in Figure 2, which is the case for m = 2. Plots for other m values show similar feature, thus are omitted. 750
k=1 k=2 k=3 k=4 k=5
700
650
600
550 E[C]
si → (1 − PR )i as i → ∞
500
450
400
350
300 0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
q
(a) 340
330
⎡ l −1 ⎤ E[CMRW ] = k ⎢ Pi fi m ⎥ + ⎣⎢ i =0 ⎦⎥
∑f
i
m
k=1 k=2 k=3 k=4 k=5
320
(21)
i =0
310 E[D]
∑
l −1
where Fi is given by (17), and Pi is the probability that a walker is still alive after ith step, and is given by Pi =
(22)
1 − q l +1
The expected latency is determined by the minimum of those among multiple walkers, whose population at ith step is Ki = 1 + kPi
(23)
The search continues only if all Ki walkers fail to find any of m replicas, so the expected latency is l −1
∑f
290
280
q i − q l +1
E[ DMRW ] =
300
i
mKi
(24)
i =0
By varying k and q, we can construct a family of RW-based schemes. Simulation results are shown in Figure 2 for a network with N = 1000, random topology and average degree being five, and with one persistent walker plus one to five non-persistent walkers. To approximate practical, optimized implementation, previously visited nodes are flagged,
UbiCC Journal - Volume 3
270 0.5
0.55
0.6
0.65
0.7
0.75 q
0.8
0.85
0.9
0.95
1
(b) Figure 2: Expected cost E[C] (a) and latency E[D] (b) versus q for RW-based scheme
From the figure, one can see that large performance variation occurs only where q is close to 1. Roughly after q is smaller than 0.9, the expected cost and latency fast converges to those achieved by a single persistent walker. Figure 2 also shows the tradeoff between expected cost and latency, i.e., latency can be reduced, at the expense of increased cost, by either increasing the number of non-persistent walkers (k) or reducing the survive probability (q). An important point, which will be elaborated in the next section, is that the cost-latency tradeoff here exhibits a very different feature from that of TTL-based schemes, comparing Figures 1 and 2.
55
MIXED SEARCH SCHEMES
UbiCC Journal - Volume 3
BCast
900
800
TTL 700
RW
E[C]
600
500
400
300
200
100
0
0
50
100
150
200
250
300
350
E[D]
(a) 1000
900
BCast 800
700
TTL
600
E[C]
In this section, we examine more closely the performance landscape of different unstructured search schemes, and introduce mixed schemes to provide more diverse cost and latency tradeoffs. In the following, we will simplify notion by using CA, DA to indicate E[CA], E[DA] of a particular search scheme A, respectively. We start by laying down some preparations. Definition 6.1 The feasible region of a search scheme in the performance space consists of the union of expected cost and latency pairs, each realizable using an instance of the search scheme. Definition 6.2 A search scheme A dominates a search scheme B if C A ≤ CB ,and DA ≤ DB . Definition 6.3 A search scheme is Paretooptimal [11] if it is dominated by no other search scheme, and the set of all search schemes that are Pareto-optimal forms the Pareto-frontier. Definition 6.4 A mixed scheme between two search schemes A and B is a randomized scheme that selects A with a certain probability p and B with the probability 1-p. Proposition 6.1 The feasible region of a mixed scheme is convex. Proof: Consider two schemes A and B in the feasible region, with expected cost and latency of CA, CB, DA, DB, respectively. For any p in [0, 1], a mixed scheme with expected cost and latency of pCA+(1p)CB, pDA+(1-p)DB, is realizable. Therefore the feasible region is convex. Q.E.D. Equipped with the above background, let us examine the cost-latency tradeoffs of three types unstructured search schemes discussed previously. Simulation results for a network of 1000 nodes, random topology and average degree being five, are shown in Figure 3. Broadcast consists of a point in the performance region, incurring minimum latency but largest cost. TTL-based schemes consist of a family of schemes parameterized by p1, the proportion of nodes covered by the first action. TTLbased schemes achieve a slightly larger latency than broadcast but can incur significantly lower cost, especially for large m (number of replicas) and optimized p1, see Figure 3(c). RW-based schemes consist of a family of schemes parameterized by k (= 1-5), the initial number of walkers, and q, the survival probability. In the figure, individual curves correspond to a particular value of k, with large k curves lying higher. RW-based schemes can achieve lowest cost, especially for small m, see Figure 3(a); but it incurs largest latency. A remarkable fact about Figure 3 is that different search schemes exhibit very different cost-latency tradeoffs, leaving conspicuous gaps between them.
1000
500
400
RW
300
200
100
0
0
20
40
60
80
100
120
140
160
180
90
100
E[D]
(b) 1000
900
BCast
800
700
TTL 600
E[C]
6
500
400
300
RW 200
100
0
0
10
20
30
40
50
60
70
80
E[D]
(c) Figure 3: Expected cost E[C] versus latency E[D] for broadcast (BCast), TTL-based, and RW-based search schemes, shown with m=2 (a), m=5 (b) and m=10 (c). Dotted lines indicate Pareto frontiers of the mixed scheme.
In the following, we focus on TTL-based and RW-based schemes, omitting broadcast because of its deficiency. We call a TTL-based scheme as latency-preferred since its expected latency is low and insensitive to the parameter (p1), which is largely
56
limited by the E[DBCast] term in (16). However, its expected cost can vary widely with the selection of the parameter. On the other hand, we call RW-based scheme as cost-preferred since it exhibits the opposite behavior: its expected cost is low and relatively insensitive to choice of the parameters (k, q), but its expected latency vary widely with selection of the parameters. Further explanation, based on the analytical model, of difference in cost-latency characteristic appears in the appendix. TTL-based and RW-based schemes are modeled after broadcast and idealized random walk, which incurs minimum latency and cost, respectively. Therefore, the fact that TTL-based schemes are latency-preferred and RW-based schemes are costpreferred is not surprising in retrospect. Due to the peculiar performance characteristic of each type of scheme, any particular type may not satisfy the requirement of an application. Applications with varying cost-latency requirements can be best served if a more diverse Pareto frontier is accessible. This can be accomplished by mixing TTL-based and RWbased schemes. The dotted lines in Figure 3 indicate the Pareto frontiers of the mixed scheme, and is constructed based on the following proposition. Proposition 6.2 The Pareto frontier of mixing two families of search schemes A and B consists of parts of the Pareto frontiers of A and B, namely LA and LB, and the Pareto frontier of the linear combination of points on LA and LB. Proof: We prove that any point, P, in the feasible region of the mixed scheme is dominated by the proposed Pareto frontier of in the proposition. P can come from only three sources: feasible region of a pure A scheme, that of a pure B scheme, and linear combinations (mixing) of scheme A and B. In case that P is from a pure A region, P is dominated by some point on LA. In case that P is from a pure B region, P is dominated by some point on LB. In case that P is from a linear combination of points P1 of A scheme and P2 of B scheme, P is dominated by a linear combination of points PA and PB, where PA and PB are on LA and LB and dominates P1 and P2, respectively, refer to Figure 4. Q.E.D. We provide a simple example to demonstrate the value of a mixed scheme. Suppose an application has the requirement: E[C] < C0 and E[D] < D0. Such requirement is satisfied only by a mixed scheme and is not satisfied by either of the pure schemes, refer to Figure 4. A cautionary note before we conclude: a mixed scheme achieves the cost-latency point only in the expected sense, but the variance can be large. However, often times application quality of service requirements are expressed in terms of expected performance metrics, and mixed schemes are useful to achieve performance tradeoffs otherwise inaccessible to pure schemes.
UbiCC Journal - Volume 3
7
CONCLUSION
Unstructured search have high potential for large-scale, highly dynamic networks, and it provides a rich field for research endeavors. Broadcast, TTL-based, RW-based schemes represent some typical instances of unstructured search. However, there exists large gap in performance landscape between these pure schemes. Mixed schemes can help to bridge the performance gap. This is important because different applications require different performance tradeoff, which the mixed scheme can provide where a pure scheme can not.
E[C]
LA P1 PA
P2
C0
LB PB
D0
E[D]
Figure 4: Pareto frontier of a mixed scheme can satisfy application requirement (shaded region) otherwise not satisfied by a pure scheme.
8
APPENDIX
We provide more detailed explanation, based on the analytical models developed earlier, the costlatency tradeoffs of TTL-based and RW-based schemes, which can be quantified by the cost-latency slope, defined in an obvious way as dC/dD. For TTL-based scheme, using (14) and (16), we have dCTTL dCTTL dp1 = = dD dDTTL TTL m N dp1
− ( N + C '2 + ) f1m ⎛ p1 1 ⎞ ⎜ − ⎟ + D '2 + π p1 ⎝ f1 2m ⎠
where in the last expression we separate out the leading term and the rest of the terms in dCTTL/dp1 and dDTTL/dp1, respectively. Exact evaluation of dCTTL/dDTTL is difficult, but we can gain some insight by just examining the ratio of the leading terms, since the individual terms decays rapidly as O(fim), and with significant jumps in TTL value (thus fi) for successive search actions. The ratio of the leading
57
terms (LT) is ⎛ N ⎛ dCTTL ⎞ ⎜ ⎟ = O ⎜⎜ m ⎝ dDTTL ⎠ LT ⎝ mf1
⎞ ⎟⎟ ⎠
which explains the large cost-latency slope of TTLbased schemes in Figure 3 (because N is large). We proceed in a similar manner with the RWbased schemes, the difference being that now we have two parameters (k, q), and that fi decreases slowly without jump (decrement being at most 1/N). Thus a leading term approximation does not provide much insight here. We write the full expressions below, using (21) and (24).
∂CRW ∂DRW
∂CRW ∂DRW
q
k
∂CTTL = ∂k ∂DTTL ∂k ∂CTTL ∂q = ∂DTTL ∂q
l −1
∑P f
i i
i =0 l −1
= m
q
∑P f
i i
m
mKi
ln fi
i =0
l −1
i =0 l −1
= m k
dPi
∑ dq dPi
∑ dq
fi m
fi mKi ln fi
i =0
From the above, we can explain the smaller costlatency slope by the absence of the √N term as with TTL-based schemes. Further, we can attribute the m factor in the denominator to the fact that the slope becomes smaller with larger m, as shown in Figure 3.
9
Randomization, in Proc. ACM MobiCom (2004). [7] N. Chang and M. Liu: Controlled Flooding Search with Delay Constraints, in Proc. IEEE INFOCOM (2006). [8] Ni, S.Y., Tseng, Y.C., Chen, Y.S., Sheu, J.P.: The Broadcast Storm Problem in a Mobile Ad Hoc Network, in Proc. ACM MobiCom (1999). [9] D. Zwillinger: CRC Standard Mathematical Tables and Formulae, Chapman & Hall/CRC (2003). [10] B. H. Hughes: Random Walks and Random Environments, Volume 1: Random Walks, Clarendon Press, Oxford (1995). [11] M. J. Osborne, and A. Rubenstein: A Course in Game Theory, MIT Press (1994). [12] M. M. Christos Gkantsidis and A. Saberi: Random walks in peer-to-peer networks, in Proc. of INFOCOM (2004). [13] Gkantsidis, C., Mihail, M., Saberi, A.: Hybrid search schemes for unstructured peer-to-peer networks, in Proc. of INFOCOM (2005) [14] L. A. Adamic, R. M. Lukose, B. Huberman, and A. R. Puniyani: Search in power-law networks, Physical Review E., vol. 64, no. 046135 (2001). [15] R. Gaeta, G. Balbo, S. Bruell, M. Gribaudo, M. Sereno: A simple analytical framework to analyze search strategies in large-scale peer-topeer networks, Performance Evaluation, Vol. 62, Issue 1-4, (2005)
REFERENCES
[1] E. M. Royer and C.-K. Toh: A Review of Current Routing Protocols for Ad Hoc Mobile Wireless Networks, IEEE Personal Communications, Vol 6, no. 2, pp. 46-55 (1999). [2] D. Braginsky and D.Estrin: Rumor Routing Algorithm for Sensor Networks, in Proc. International Conference on Distributed Computing Systems (2002). [3] S. Shakkottai: Asymptotics of Query Strategies over a Sensor Network, in Proc. IEEE INFOCOM (2004). [4] E. Cohen and S. Shenker: Replication Strategies in Unstructured Peer-to-Peer Networks, in Proc. ACM SIGCOMM (2002). [5] Y. Baryshinikov, E. Coffman, P. Jelenkovic, P. Momcilovic, and D. Rubenstein: Flood Search under the California Split Rule, Operations Research Letters, vol. 32, no. 3, pp. 199-206, (2004). [6] N. Chang and M. Liu: Revisiting the TTL-based Controlled Flooding Search: Optimality and
UbiCC Journal - Volume 3
58
Impact of Query Correlation on Web Searching Ash Mohammad Abbas Department of Computer Engineering Zakir Husain College of Engineering and Technology Aligarh Muslim University, Aligarh - 202002, India.
Abstract— Correlation among queries is an important factor to analyze as it may affect the results delivered by a search engine. In this paper, we analyze correlation among queries and how it affects the information retrieved from the Web. We analyze two types of queries: (i) queries with embedded semantics, and (ii) queries without any semantics. In our analysis, we consider parameters such as search latencies and search relevance. We focus on two major search portals that are mainly used by end users. Further, we discuss a unified criteria for comparison among the performance of the search engines. Index Terms— Query correlation, search portals, Web information retrieval, unified criteria for comparison, earned points.
I. I NTRODUCTION The Internet that was aimed to communicate research activities among a few universities in United States has now become a basic need of life for all people who can read and write throughout the world. It has become possible only due to the proliferation of the World Wide Web (WWW) which is now simply called as the Web. The Web has become the largest source of information in all parts of life. Users from different domains often extract information that fits to their needs. The term Web information retrieval1 is used for extracting information from the Web. Although, Web information retrieval finds its roots to traditional database systems [4], [5]. However, the retrieval of information from the Web is more complex as compared to the information retrieval from a traditional database. This is due to subtle differences in their respective underlying databases2 . In a traditional database, the data is often organized, limited, and static. As opposed to that the Webbase is unorganized, unlimited, and is often dynamic. Every second a large number of updates are carried out in the Webbase. Moreover, as opposed to a traditional database which is controlled by a specific operating system and the data is located either at a central location or at least at a few known locations, the Webbase is not controlled by any specific operating system and its data may not reside either at a central site or at few known locations. Further, the Webbase can be thought as a collection of a large number of traditional databases of various organization. The expectations of a user searching information 1 The terms Web surfing, Web searching, Web information retrieval, Web mining are often used in the same context. However, they differ depending upon the methodologies involved, intensity of seeking information, and intentions of users who extract information from the Web. 2 Let us use the term Webbase for the collection of data in case of the Web, in order to differentiate it from the traditional database.
UbiCC Journal - Volume 3
on the Web are much higher than the user which is simply retrieving some information from a traditional database. This makes the task of extracting information from the Web a bit challenging [1]. Since the Web searching is an important activity and the results obtained so may affect decisions and directions for individuals as well as for organizations, therefore, it is of utmost importance to analyze the parameters or constituents involved in it. Many researchers have analyzed many different issues pertaining to Web searching that include index quality [2], user-effort measures [3], Web page reputation [6], and user perceived quality [7]. In this paper, we try to answer the following question: What happens when a user fires queries to a search engine one by one that are correlated? Specifically, we wish to evaluate the effect of correlation among the queries submitted to a search engine (or a search portal). Rest of this paper is organized as follows. In section II, we briefly review methodologies used in popular search engines. In section III, we describe query correlation. Section IV contains results and discussion. In section V, we describe a criteria for comparison of search engines. Finally, section VI is for conclusion and future work. II. A R EVIEW
OF
M ETHODOLOGIES U SED E NGINES
IN
S EARCH
First we discuss a general strategy employed for retrieving information from the Web and then we shall review some of the search portals. A. A General Strategy for Searching A general strategy for searching information on the Web is shown in Fig. 1. Broadly a search engine consists of the following components: User Interface, Query Dispatcher, Cache 3 , Server Farm, and Web Base. The way these components interact with one another depends upon the strategy employed in a particular search engine. We describe here a broad view. An end user fires a query using an interface, say User Interface. The User Interface provides a form to the user. The user fills the form with a set of keywords to be searched. The query goes to the Query Dispatcher which, after performing some refinements, sends it to the Cache. If the query obtained after 3 We use the word Cache to mean Search Engine Cache i.e. storage space where results matching to previously fired queries or words are kept for future use.
59
2 1 query
response 9
Fig. 1.
U S E R I N T E R F A C E
4 3
Query Dispatcher
5
8 7 WEB BASE Server Farm
Cache
6
A general strategy for information retrieval from the Web.
refinement4 is matched to a query in the Cache, the results are immediately sent by the Query Dispatcher to the User Interface and hence to the user. Otherwise, the Query Dispatcher sends the query to one of the Server in the Server Farm which are busy in building a Web Base for the search engine. The server so contacted, after due consideration from the Web Base sends it to the Cache so that the Cache may store those results for future reference, if any. Cache sends them to the Query Dispatcher. Finally, through the User Interface, response is returned to the end user. In what follows, we briefly review the strategies employed by different search portals.
search engine may not search words that are not part of its ontology. It can modify its ontology with time. One step more, an ontology based search engine may also shorten the set of results searched before presenting it to the end users that are not part of the ontology of the given term. We now describe an important aspect pertaining to information retrieval from the Web. The results delivered by a search engine may depend how the queries are formulated and what relation a given query has with previously fired queries, if any. We wish to study the effect of correlation among the queries submitted to a search engine. III. Q UERY C ORRELATION
B. Review of Strategies of Search Portals The major search portals or search engines5 which end users generally use for searching are GoogleTM and YahooTM . Let us briefly review the methodologies behind their respective search engines6 of these search portals. Google is based on the PageRank scheme described in [8]. It is somewhat similar to the scheme proposed by Kleinberg in [9] which is based on hub and authority weights and focuses on the citations of a given page. To understand the Google’s strategy, one has to first understand the HITS (HyperlinkInduced Topic Search) algorithm proposed by Klienberg. For that the readers are directed to [9] for HITS and to [8] for PageRank. On the other hand, Yahoo employs an ontology based search engine. An ontology is a formal term used to mean a hierarchical structure of terms (or keywords) that are related. The relationships among the keywords are governed by a set of rules. As a result, an ontology based search engine such as Yahoo may search other related terms that are part of the ontology of the given term. Further, an ontology based 4 By refinement of a query, we mean that the given query is transformed in such a way so that the words and forms that are not so important are eliminated so that they do not affect the results. 5 A search engine is a part of search portal. A search portal provides many other facilities or services such as Advanced Search, News etc. 6 The respective products are trademarks of their organizations.
UbiCC Journal - Volume 3
The searched results may differ depending upon whether a search engine treats a set of words as an ordered set or an unordered set. In what follows, we consider each one of them. A. Permutations Searched results delivered by a search engine may depend upon the order of words appearing in a given query7. If we take into account order of words, the same set of words may form different queries for different orderings. The different orderings of the set of words of the given query are called permutations. The formal definition of permutations of a given query is as follows. Definition 1: Let the query Q wi 1 i m, Q φ, be a set of words excluding stop words of a natural language. Let P x j 1 j m be a set of words excluding stop words. If P is such that wi x j for some j not necessarily equal to i, and wi Q x j P such that wi x j where j may not be equal to i, then P is called a permutation of Q. In the above definition, stop words are language dependent. For example in the English language, the set of stop words, S, is often taken as S
a
an
the is
am are
will
shall
of in
for
7 The term ’query’ means a set of words that is given to a search engine to search for the information available on the Web.
60
1
Note that if there are m words (excluding the stop words) in the given query, the number of permutations is m!. The permutations are concerned with a single query. Submitting different permutations of the given query to a search engine, one may evaluate how the search engine behaves for different orderings of the same set of words. However, one would like to know how the given search engine behaves when an end user fires different queries that may or may not be related. Specifically, one would be interested in the behavior of a given search engine when the queries are related. In what follows, we discuss what is meant by the correlation among different queries.
0.8
Latency
0.6
0.4
0.2
0
1
Fig. 2.
Q1 Q2 This is based on the fact that Q1 Q2 Q2 . Q1 Note that 0 Correlation Factor 1. For two uncorrelated queries the Correlation Factor is 0. Further, one can see from Definition 1 that for the permutations of the same query, Correlation Factor is 1. Similarly, one may define the Correlation Factor for a cluster of queries. Let the number of queries be O. The cardinality of the union of the given cluster of queries is given by the following equation.
O
Qo
o 1
∑ Qi ∑
i
Qi
∑
Q j
i j
Qi
Qj
Qk
QO
2
3
4
6 5 Page Number
7
O 1
Q1
Q2
1
Google Yahoo
0.8
Latency
0.6
0.4
0.2
0
1
Fig. 3.
2
3
4
6 5 Page Number
7
O o 1 Qo O o 1 Qo
1
Google Yahoo
0.8
0.6
0.4
0.2
0
1
Fig. 4.
2
3
4
6 5 Page Number
7
9
8
10
Latency versus page number for permutation P3.
1
Google Yahoo
(2)
0.8
0.6
(3)
0.4
A high correlation factor means that the queries in the cluster are highly correlated, and vice versa. In what follows, we discuss results pertaining to query correlation. 8 This correlation factor is nothing but Jaccard’s Coefficient, which is often used as a measure of similarity.
UbiCC Journal - Volume 3
10
Latency
9
8
Latency versus page number for permutation P2.
Using (2), one may define the Correlation Factor of a cluster of queries as follows. Correlation Factor
10
Latency versus page number for permutation P1.
i j k
1
9
8
Latency
B. Correlation An important aspect that may affect the results of Web searching is how different queries are related. Two queries are said to be correlated if there are common words between them. A formal definition of correlation among queries is as follows. Definition 2: Let Q1 and Q2 be queries given to a search engine such that Q1 and Q2 are sets of words of a natural language and Q1 Q2 φ. Q1 and Q2 are said to be correlated if and only if there exists a set C Q1 Q2 , C φ. One may use the above definition to define k-correlation between any two queries. Formally, it can be stated as a corollary of Definition 2. Corrollary 1: Two queries are said to be k-correlated if and only if C k, where denotes the cardinality. For two queries that are correlated, we define a parameter called Correlation Factor8 as follows. Q1 Q2 Correlation Factor (1) Q2 Q1
Google Yahoo
0.2
0
1
Fig. 5.
2
3
4
6 5 Page Number
7
8
9
10
Latency versus page number for permutation P4.
61
TABLE I S EARCH LATENCIES , QUERY SPACE , AND
THE NUMBER OF RELEVANT RESULTS FOR DIFFERENT PERMUTATIONS OF THE QUERY: FOR
Permutation 1
2
3
4
5
6
p1 0.22 300000 8 0.51 300000 3 0.30 300000 6 0.60 300000 3 0.38 300000 3 0.36 300000 5
p2 0.15 300000 5 0.15 300000 2 0.08 300000 4 0.07 300000 0 0.09 300000 2 0.15 300000 4
p3 0.04 300000 1 0.22 300000 2 0.18 300000 1 0.35 300000 2 0.39 300000 1 0.10 300000 1
Ash Mohammad Abbas
G OOGLE .
p4 0.08 300000 0 0.19 300000 1 0.20 300000 3 0.11 300000 1 0.14 300000 2 0.12 300000 3
p5 0.33 300000 2 0.13 300000 1 0.14 300000 2 0.13 300000 0 0.17 300000 1 0.18 300000 0
p6 0.29 300000 0 0.12 300000 0 0.25 300000 1 0.15 300000 0 0.15 300000 0 0.17 300000 2
p7 0.15 300000 0 0.10 300000 2 0.13 300000 1 0.23 300000 2 0.14 300000 0 0.15 300000 1
p8 0.13 300000 3 0.27 300000 0 0.21 300000 0 0.13 300000 0 0.16 300000 1 0.13 300000 2
p9 0.16 300000 0 0.16 300000 0 0.14 300000 0 0.28 300000 1 0.15 300000 1 0.20 300000 2
p10 0.17 300000 0 0.15 300000 0 0.21 300000 0 0.26 300000 1 0.13 300000 1 0.15 300000 0
TABLE II S EARCH LATENCIES , QUERY SPACE , AND
THE NUMBER OF RELEVANT RESULTS FOR DIFFERENT PERMUTATIONS OF THE QUERY: FOR
Permutation 1
2
3
4
5
6
p1 0.15 26100 10 0.18 26900 4 0.12 26900 10 0.03 27000 7 0.12 26400 8 0.16 27100 10
p2 0.15 26400 4 0.13 27000 6 0.11 27100 3 0.10 26400 4 0.12 26800 5 0.10 26700 5
p3 0.27 26400 1 0.20 27000 1 0.15 26900 1 0.14 26400 0 0.20 26800 1 0.16 27100 0
p4 0.24 27000 0 0.15 26900 1 0.14 26900 2 0.13 26700 2 0.08 26800 1 0.12 26700 0
p5 0.25 27000 0 0.19 25800 0 0.11 26500 0 0.12 27000 1 0.13 26800 0 0.13 26600 0
p6 0.23 26900 1 0.10 26900 1 0.10 26800 0 0.20 26700 0 0.10 26700 0 0.11 27000 1
p7 0.34 26900 0 0.15 26900 1 0.11 26800 0 0.10 26400 0 0.12 26700 0 0.10 26600 0
p8 0.21 26900 0 0.09 26800 1 0.12 26500 0 0.19 26900 1 0.09 26800 0 0.11 26900 0
p9 0.27 25900 0 0.12 26800 0 0.09 26500 0 0.12 26800 0 0.13 26700 0 0.12 26500 0
p10 0.30 25900 0 0.13 26800 0 0.13 26700 0 0.17 26800 1 0.20 26200 1 0.15 26500 0
1
1
Google Yahoo
Google Yahoo
0.8
0.6
0.6
Latency
Latency
0.8
0.4
0.4
0.2
0.2
0
0
1
Fig. 6.
Ash Mohammad Abbas
YAHOO .
2
3
4
6 5 Page Number
7
8
9
1
10
Latency versus page number for permutation P5.
IV. R ESULTS AND D ISCUSSION The search portals that we have evaluated are Google and Yahoo. We have chosen them because they represent the search portals that majority of end users in today’s world use in their day-to-day searching. One more reason behind choosing them for performance evaluation is that they represent different UbiCC Journal - Volume 3
Fig. 7.
2
3
4
6 5 Page Number
7
8
9
10
Latency versus page number for permutation P6.
classes of search engines. As mentioned earlier, Yahoo is based on ontology while Google is based on page ranks. Therefore, if one selects them, one may evaluate two distinct classes of search engines. The search environment is as follows. The client from where queries were fired was a Pentium III machine. The machine 62
0.65
was part of a 512Kbps local area network. The operating system was Windows XP. In what follows, we discuss behavior of search engines for different permutations of a query.
Google:Q1 Google:Q2 Yahoo:Q1 Yahoo:Q2
0.6
0.55
Latency
0.5
0.45
A. Query Permutations
0.4
0.35
To see how a search engine behaves for different permutations of a query, we consider the following query.
0.3
0.25
Ash Mohammad Abbas 0.2
1
Fig. 8.
1.5
2
2.5 Correlation
3
3.5
4
The different permutations of this query are
Latency versus correlation for queries with embedded semantics.
1.1
1 2 3 4 5 6
Google:Q1 Google:Q2 Yahoo:Q1 Yahoo:Q2
1
0.9
Latency
0.8
0.7
0.6
0.5
0.4
0.3
0.2
1.5
1
Fig. 9.
2
2.5 Correlation
3
3.5
4
Latency versus correlation for random queries.
0.65
Google:Q1 Google:Q2 Yahoo:Q1 Yahoo:Q2
0.6
0.55
Latency
0.5
0.45
0.4
0.35
0.3
0.25
0.2
1
1.5
2
2.5 Correlation
3
3.5
4
Fig. 10. Query Space versus correlation for queries with embedded semantics.
1.1
Google:Q1 Google:Q2 Yahoo:Q1 Yahoo:Q2
1
0.8
Latency
Mohammad Abbas Ash Mohammad Ash Abbas
Abbas Mohammad Mohammad Ash Abbas Ash
We have assigned a number to each permutation to differentiate from one another. We wish to analyze search results on the basis of search time, number of relevant results and query space. The query space is nothing but the cardinality of all results returned by a given search engine in response to a given query. Note that search time is defined as the actual time taken by the search engine to deliver the results searched. Ideally, it does not depend upon the speeds of hardware, software, and network components from where queries are fired because it is the time taken by the search engine server. Relevant results are those which the user intends to search. For example, the user intends to search information about Ash Mohammad Abbas9. Therefore, all those results that contain Ash Mohammad Abbas are relevant for the given query. In what follows, we discuss the results obtained for different permutation of a given query. Let the given query be Ash Mohammad Abbas. For all permutations, all those results that contain Ash Mohammad Abbas are counted as relevant results. Since both Google and Yahoo deliver the results page wise, therefore, we list all parameters mentioned in the previous paragraph page wise. We go up to 10 pages for both the search engines as the results beyond that are rarely significant. Table I shows search latencies, query space, and the number of relevant results for different permutations of the given query. The search portal is Google. Our observations are as follows. For all permutations, the query space remains the same and it does not vary along the pages of results. The time to search the first page of the results in response to a the given query is the largest for all permutations. The first page of results contain the most relevant results.
0.9
0.7
0.6
0.5
0.4
0.3
0.2
1
Fig. 11.
Ash Ash Abbas Abbas Mohammad Mohammad
1.5
2
2.5 Correlation
3
3.5
Query Space versus correlation for random queries.
UbiCC Journal - Volume 3
4
9 We have intentionally taken the query: Ash Mohammad Abbas. We wish to search for different permutations of a query and the effect of those permutations on query space and on the number of relevant results. The relevance is partly related to the intentions of an end-user. Since we already know what are the relevant results for the chosen query, therefore, this is easier to decide what relevant results out of them have been returned by a search engine. The reader may take any other query, if he/she wishes so. In that case, he has to decide what are the results that are relevant to his/her query and this will partly depend upon what he/she intended to search.
63
TABLE III Q UERIES WITH EMBEDDED SEMANTICS .
S. No. E1 E2 E3 E4
Query No. Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2
Query node edge node edge node edge node wireless
disjoint disjoint disjoint disjoint disjoint disjoint disjoint node
multipath multicast multipath multicast multipath multipath multipath disjoint
Correlation 1 routing routing routing routing routing multipath
2 3 ad hoc routing
4
TABLE IV Q UERIES WITHOUT EMBEDDED SEMANTICS ( RANDOM QUERIES ).
S. No. R1 R2 R3 R4
Query No. Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2
Query adhoc quadratic computer hiring wireless mitigate few shallow
node power node parity node node node mitigate
ergonomics node constellations node parity shallow parity node
Correlation 1 parity biased common rough mitigate parity
2 mitigate parity common common
3 correlation stanza
4
TABLE V S EARCH TIME AND Q UERY S PACE FO QUERIES WITH EMBEDDED SEMANTICS . S. No. E1
Query No. Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2
E2 E3 E4
Time 0.27 0.23 0.48 0.32 0.48 0.24 0.31 0.33
Google Query Space 43100 63800 37700 53600 37700 21100 23500 25600
Time 0.37 0.28 0.40 0.32 0.40 0.34 0.64 0.44
Yahoo Query Space 925 1920 794 1660 794 245 79 518
TABLE VI S EARCH TIME AND QUERY SPACE FOR RANDOM QUERIES . S. No. R1 R2 R3 R4
Query No. Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2
Time 0.44 0.46 0.46 0.42 0.47 0.33 0.34 1.02
Google Query Space 28500 476000 34300 25000 25000 754 20000 374
Time 0.57 0.28 0.55 0.35 0.40 0.68 0.58 0.64
Yahoo Query Space 25 58200 164 90 233 31 71 23
Table II shows the same set of parameters for different permutations of the given query for search portal Yahoo. From the table, we observe that
number of relevant results. For permutation 2 (i.e. Ash Abbas Mohammad), the second page contains the largest number of relevant results.
As opposed to Google, the query space does not remain same, rather it varies with the pages of searched results. The query space in this case is less than Google. The time to search the first page of results is not necessarily the largest of the pages considered. More precisely, it is larger for the pages where there is no relevant result. Further, the time taken by Yahoo is less than that of Google. In most of the cases, the first page contains the largest UbiCC Journal - Volume 3
Let us discuss reasons for the above mentioned observations. Consider the question why query space in case of Google is larger than that of Yahoo. We have pointed out that Google is based on the page ranks. For a given query (or a set of words), it ranks the pages. It delivers all the ranked pages that contain the words contained in the given query. On the other hand, Yahoo is an ontology based search engine. As mentioned earlier, it will search only that part of its Webbase that constitutes the ontology of the given query. This is the 64
TABLE VII L ATENCY MATRIX , L, P 1 2 3 4 5 6
TABLE IX
FOR DIFFERENT PERMUTATIONS .
R ELEVANCE MATRIX FOR DIFFERENT PERMUTATIONS FOR G OOGLE .
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
1 0 0 0 0 0
1 2
1 0 0 0 0 1
1 0 0 1 0
0 1 0 0 0 0
0 0 0 1 0 0
1 1 0 0 0 0
1 0 0 1 0 0
1 0 0 0 0 0
1 0 0 0 1
0 1 1 1 0
1 2
1 2
TABLE VIII Query Space MATRIX , S, FOR DIFFERENT PERMUTATIONS . P 1 2 3 4 5 6
p1 1 1 1 1 1 1
p2 1 1 1 1 1 1
p3 1 1 1 1 1 1
p4 1 1 1 1 1 1
p5 1 1 1 1 1 1
p6 1 1 1 1 1 1
p7 1 1 1 1 1 1
p8 1 1 1 1 1 1
p9 1 1 1 1 1 1
p10 1 1 1 1 1 1
reason why query space in case of Google is larger than that of Yahoo. Let us answer the question why query space changes in case of Yahoo and why it remains constant in case of Google. Note that ontology may change with time and with order of words in the given query. For every page of results, Yahoo estimates the ontology of the given permutation of the query before delivering the results to the end user. Therefore, the query space for different permutations of the given query is different and it changes with pages of the searched results10 . However, page ranks do not change either with pages or with order of words. The page ranks will only change when new links or documents are added to the Web that are relevant to the given query. Since neither a new link nor a new document is added to the Web during the evaluation of permutations of the query, therefore, the query space does not change in case of Google. In order to compare the performance of Google and Yahoo, the latencies versus page numbers for different permutations of the query have been shown in Figures 2 through 7. Let us consider the question why search time in case of Google is larger than that of Yahoo. Note that Google ranks the results before delivering them to end users while Yahoo does not. The ranking of pages takes time. This is the reason why search time taken by Google is larger than that of Yahoo. In what follows, we discuss how a search engine behaves for correlated queries.
P 1 2 3 4 5 6
10 This observed behavior may also be due to the use of a randomized algorithm. To understand the behavior of randomized algorithms, readers are referred to any text on randomized algorithms such as [10]. 11 More precisely, all words in these queries are from ad hoc wireless networks, an area that authors of this paper like to work.
UbiCC Journal - Volume 3
p2 5 2 4 0 2 4
p3 1 2 1 2 1 1
p4 0 1 3 1 2 3
p5 2 1 2 0 1 0
p6 0 0 1 0 0 2
p7 0 2 1 2 0 1
p8 3 0 0 0 1 2
p9 0 0 0 1 1 2
p10 0 0 0 1 1 0
TABLE X R ELEVANCE MATRIX FOR DIFFERENT PERMUTATIONS FOR YAHOO . P 1 2 3 4 5 6
p1 10 4 10 7 8 10
p2 4 6 3 4 5 5
p3 1 1 1 0 1 0
p4 0 1 2 2 1 0
p5 0 0 0 1 0 0
p6 1 1 0 0 0 1
p7 0 1 0 0 0 0
p8 0 1 0 1 0 0
p9 0 0 0 0 0 0
p10 0 0 0 1 1 0
shown in Table IV. The words contained in these queries are random and are not related semantically. We wish to evaluate the performance of a search engine for k-correlated queries. For that we evaluate search time and query space of a search engine for the first page of results. Since both Google and Yahoo deliver 10 results per page, therefore, looking for the first page of results means that we are evaluating 10 top most results of these search engines. Note that we do not consider number of relevant results because relevancy in this case would be query dependent. Since there is no single query, therefore, evaluation of relevancy would not be so useful. Table V shows search time and query space for k-correlated queries with embedded semantics (see Table III). The second query, Q2 , is fired after the first query Q1 . On the other hand, Table VI shows search time and query space for k-correlated queries whose words may not be related (see Table IV). TABLE XI R ELEVANCE FOR DIFFERENT PERMUTATIONS . P 1 2 3 4 5 6 Total
Google 19 11 18 10 12 20 90
Yahoo 16 15 16 16 16 16 95
TABLE XII
B. Query Correlation We have formulated k-correlated queries as shown in Table III. Since all words contained in a query are related11, therefore, we call them queries with embedded semantics. On the other hand, we have another set of k-correlated queries as
p1 8 3 6 3 3 5
Earned Points (EP) FOR DIFFERENT PERMUTATIONS . P Latency 1 2 3 4 5 6 Total
14 5 3 4 1 3 25
Google Query Space 19 11 18 10 12 20
EP
Latency
33 5 14 22 11 15 22 5 118
3 14 13 9 10 16
Yahoo Query Space 0 0 0 0 0 0
EP 3 14 13 9 10 16 65
65
In order to compare the performance of Yahoo and Google, the latencies versus correlation for queries with embedded semantics is shown in Figure 8 and that for randomized queries is shown in Figure 9. Similarly, the query space for queries with embedded semantics is shown in Figure 10 and that for randomized queries is shown in Figure 11. The query space of Yahoo is much less than that of Google for the reasons discussed in the previous subsection. Other important observations are as follows. In case of k-correlated queries with embedded semantics, generally the time to search for Q2 is less than that of Q1 . This is due to the fact that since the queries are correlated, some of the words of Q2 have already been searched while searching for Q1 . The query space is increased when the given query has a word that is more frequently found in Web pages (e.g. in R1: Q2 , the word quadratic that is frequently used in Engineering, Science, Maths, Arts, etc.). The query space is decreased when there is a word included in the query which is rarely used (e.g. mitigate included in R3,R4:Q1 Q2 and shallow included in R3,R4:Q2). The search time is larger in case of randomized queries as compared to queries with embedded semantics. The reason for the this observation is as follows. In case of queries with embedded semantics, the words of a given query are related and are found in Web pages that are not too far from one another either from the point of view of page rank as in Google or from the point of view of ontology as in Yahoo. One cannot infer anything about the search time of Google and Yahoo as it depends upon the query. More precisely, it depends upon the fact which strategy takes more time whether page rank in Google or estimation of ontology in Yahoo. However, from Table V and Table VI, one can infer the following. Google is better in the sense that its query space is much larger than that of Yahoo. However, Yahoo takes less time as compared to Google for different permutations of the same query. For k-correlated queries with embedded semantics, Google takes less time to search for the first query as compared to Yahoo. It also applies to randomized queries with some exceptions. In exceptional cases, Google takes much more time as compared to Yahoo. We have mentioned it previously that it depends upon the given query as well as the strategy employed in the search engine. In what follows, we describe a unified criteria for comparing the search engines considered in this paper.
V. A U NIFIED C RITERIA
FOR
C OMPARISON
Let us denote Google by a superscript ’1’ and Yahoo by a superscript ’0’12 . Let L li j be a matrix where li j is defined 12 This is simply a representation. One may consider a representation which is reverse of it, then also, there will not be any effect on the criteria.
UbiCC Journal - Volume 3
TABLE XIII CEP FOR DIFFERENT PERMUTATIONS FOR G OOGLE . P
Latency Contribution 123 834 61 262 116 534 35 918 73 458 530 376 530 376
1 2 3 4 5 6 Total
Query Space Contribution 5700000 3300000 5400000 3000000 3600000 6000000 27000000
TABLE XIV CEP FOR DIFFERENT PERMUTATIONS FOR YAHOO . P
Latency Contribution 101 385 107 821 131 558 308 197 130 833 121 591 901 385
1 2 3 4 5 6 Total
as follows. li j
if latency1i j if latency1i j otherwise.
1 1 2
0 Similarly, let S follows. si j
Query Space Contribution 419900 404100 431000 428700 425000 431500 2540200
latency0i j latency0i j
(4)
si j be a matrix where si j is defined as if space1i j if space1i j otherwise.
1 1 2
0
space0i j space0i j
(5)
In matrices defined above, where there is a ’1’, it means at that place Google is the winner and a ’ 12 ’ represents that there has been a tie between Google and Yahoo. We now define a parameter that we call Earned Points (EP) which is as follows.
∑ pages
EPk
i 1
relevantki
L
k i
Sik
(6)
where, superscript k 0 1 denotes the search engine. Table VII shows a latency matrix, L, for different permutations of the query as that for Table I and Table II, and has been constructed using both of them. In the latency matrix, there are 40 ’0’s, 17 ’1’s, and 3 ’ 12 ’. We observe from the latency matrix that Yahoo is the winner (as far as latencies are concerned), as there are 40 ’0’s out of 60 entries in total. On the other hand, Table VIII shows the query space matrix, S, for different permutations of the same query and is constructed using the tables mentioned in the preceding paragraph. One can see that as far as query space is concerned, Google is the sole winner. Infact, query space of Google is much larger than that of Yahoo. The relevance matrix for Google is shown in Table IX and that for Yahoo is shown in Table X. The total relevance for the first ten pages is shown in Table XI for both Google as well as Yahoo. It is seen from Table XI that the total relevance for Google is 90 and that for Yahoo is 95. Average relevance per 66
TABLE XV C ONTRIBUTION DUE
TO QUERY SPACE IN WEIGHTS .
Weights wl wl wl wl wl wl wl
1, 1, 1, 1, 1, 1, 1,
wq wq wq wq wq wq wq
10 10 10 10 10 10 1
CEP FOR DIFFERENT SETS OF
6
5 4
3 2 1
Google
Yahoo
27 00 270 00 2700 00 27000 00 270000 00 2700000 00 27000000
2 54 25 40 254 02 2540 20 25402 00 254020 00 2540200
TABLE XVI CCEP FOR DIFFERENT SETS OF comparable weights. Weights wl 0 9, wl 0 8, wl 0 7, wl 0 6, wl 0 5, wl 0 4, wl 0 3, wl 0 2, wl 0 1,
wq wq wq wq wq wq wq wq wq
01 02 03 04 05 06 07 08 09
Google 486 3384 442 3008 398 2632 354 2256 310 1880 266 1504 222 1128 178 0752 134 0376
Yahoo 811 2465 721 1080 630 9695 540 8310 450 6925 360 5540 270 4155 180 2770 90 1385
∑ pages
CEP
i 1
relevantki
1 dik
qki
∑ pages
CEP
k
relevantki
i 1
permutation and per page for Google is 1 5 and that for Yahoo is 1 583. Therefore, as far as average relevance is concerned, Yahoo is the winner. Table XII shows the number of earned points for both Google as well as Yahoo for different permutations of the query mentioned earlier. We observe that the number of earned points for Google is 118 and that for Yahoo is 65. The number of earned points of Google is far greater than Yahoo. The reason behind this is that query space of Yahoo is always less than that of Google and it does not contribute to the number of earned points. A closer look on the definition of EP reveals that while defining the parameter EP in (6) together with (4) and (5), we have assumed that a search engine either has a constituent parameter (latency or query space) or it does not have that parameter at all. The contribution of some of the parameter is lost due the fact that the effective contribution of other parameter by which the given parameter is multiplied is zero. Note that our goal behind introduction of (6) was to rank the given set of search engines. We call this type of ranking of search engines as Lossy Constituent Ranking (LCR). We, therefore, feel that there should be a method of comparison between a given set of search engines that is lossless in nature. For that purpose, we define another parameter that we call Contributed Earned Points (CEP). The definition of CEP is as follows. k
Table XIII shows contributions of latency and query space in CEP for Google. Similarly, Table XIV shows the same for Yahoo. We observe that contribution of latency for Google is 530 376 and that for Yahoo is 901 385. However, contribution of query space for Google is 27000000 and that for Yahoo is 2540200. In other words, the contribution of query space for Google is approximately 11 times of that for Yahoo. Adding these contributions shall result in a larger CEP for Google as compared to Yahoo. The CEP defined using (7) has a problem that we call dominating constituent problem (DCP). The larger parameter suppresses the smaller parameter. Note that the definition of CEP in (7) assumes equal weights for latency and query space. On the other hand, one may be interested in assigning different weights to constituents of CEP depending upon the importance of constituents. Let us rewrite (7) to incorporate weights. Let wl and wq be the weights assigned to latency and query space, respectively. The (7) can be written as follows.
(7)
where, superscript k 0 1 denotes the search engine, d denotes the actual latency, and q denotes the actual query space. The reason behind having an inverse of actual latency in (7) is that the better search engine would be that which takes less time. UbiCC Journal - Volume 3
1 wl qki wq dik
(8)
The weights should be chosen carefully. For example, the weights wl 1, wq 10 6 will add 27 to the contribution in CEP due to query space for Google and 2 54 to Yahoo. On the other hand, a set of weights wl 1, wq 10 5 shall add 270 for Google and 25 4 for Yahoo. Table XV shows contribution of query space in CEP for different sets of weights. It is to note that wl is fixed to 1 for all sets, and only wq is varied. As wq is increased beyond 10 5, the contribution of query space starts dominating over the contribution of latency. The set of weight wl 1 wq 10 5 indicates that one can ignore contribution of query space in comparison to the contribution of latencies provided that one is more interested in comparing search engines with respect to latency. In that case, an approximate expression for CEP can be written as follows.
∑ pages
CEP
k
relevantki
i 1
1 dik
(9)
Alternatively, one may consider an approach that is combination of the definition of EP defined in (6) (together with (4) and (5) and that of CEP defined in (7). In that we may use the definition of matrix S which converts the contribution of query space in the form of binaries13 . The modified definition is as follows.
∑ pages
CCEPk
relevantki
i 1
1 dik
Sik
(10)
where Sik is in accordance with the definition of S given by (5). The acronym CCEP stands for Combined Contributory Earned Points. If one wishes to incorporate weights, then the definition of CCEP becomes as follows.
∑ pages
CCEPk
i 1
relevantki
1 wl Sik wq dik
(11)
13 We mean that the matrix S says either there is a contribution of query space of a search engine provided that its query space is larger than that of the other one or there is no contribution of query space at all, if otherwise.
67
In the definition of CCEP given by (11) the weights can be comparable and the dominant constituent problem mentioned earlier can be mitigated for comparable weights. We define comparable weights as follows. Definition 3: A set of weights W wi wi 0, is said to have comparable weights if and only if ∑i wi 1 and the condition 19 wwij 9 is satisfied wi w j W . Table XVI shows the values of CCEP for different sets of comparable weights. We observe that the rate of decrease of CCEP for Yahoo is larger than that of Google. For example, for wl 0 9 wq 0 1, CCEP for Google is 486 3384 and that for Yahoo is 811 2465. For wl 0 8 wq 0 2, CCEP for Google is 442.3008 and that for Yahoo is 721 1080. In other words, the rate of decrease in CCEP for Google is 9 05% and that for Yahoo is 11 11%. The reason being that in the query space matrix, S, (see Table VIII) all entries are ’1’. It means that query space of Google is always larger than that of Yahoo. Therefore, in case of Yahoo, the contribution due to query space is always 0 irrespective of the weight assigned to it. However, in case of Google the contribution due to query space is nonzero and increases with an increase in weight assigned to the contribution due to query space. Moreover, for a set of weights, W wl 0 5 wq 0 5 , the values of CCEP are 310 1880 and 450 6925 for Google and Yahoo, respectively. It means that if one wishes to assign equal weights to latency and query space then Yahoo is the winner in terms of the parameter CCEP. In case of CCEP, the effect of the dominating constituent problem is less as compared to that in case of CEP. In other words, the effect of large values of query space is fairly smaller in case of CCEP as compared to that in case of CEP. This is with reference to our remark that with the use of CCEP the dominating constituent problem is mitigated.
VI. C ONCLUSIONS In this paper, we analyzed the impact of correlation among queries on search results for two representative search portals namely Google and Yahoo. The major accomplishments of the paper are as follows: We analyzed the search time, the query space and the number of relevant results per page for different permutations of the same query. We observed that these parameters vary with pages of searched results and are different for different permutations of the given query. We analyzed the impact of k-correlation among two subsequent queries given to a search engine. In that we analyzed the search time and the query space. We observed that – The search time is less in case of queries with embedded semantics as compared to randomized queries without any semantic consideration. – In case of randomized query, the query space is increased in case the given query includes a word that is frequently found on the Web and vice versa. Further, we considered a unified criteria for comparison between the search engines. Our criteria is based upon the concept of earned points. An end-user may assign different UbiCC Journal - Volume 3
weights to different constituents of the criteria—latency and query space. Our observations are as follows. We observed that performance of Yahoo is better in terms of the latencies, however, Google performs better in terms of query space. We discussed the dominant constituent problem. We discussed that this problem can be mitigated using the concept of contributory earned points if weights assigned to constituents are comparable. If both the constituent are assigned equal weights, we found that Yahoo is the winner. However, the performance of a search engine may depend upon the criteria itself and only one criteria may not be sufficient for an exact analysis of the performance. Further investigations and improvements in this direction forms our future work. R EFERENCES [1] S. Malhotra, ”Beyond Google”, CyberMedia Magazine on Data Quest, vol. 23, no. 24, pp.12, December 2005. [2] M.R. Henzinger, A. Haydon, M. Mitzenmacher, M. Nozark, ”Measuring Index Quality Using Random Walks on the Web”, Proceedings of 8th International World Wide Web Conference, pp. 213-225, May 1999. [3] M.C. Tang, Y. Sun, ”Evaluation of Web-Based Search Engines Using User-Effort Measures”, Library an Information Science Research Science Electronic Journal, vol. 13, issue 2, 2003, http://libres.curtin.edu.au/libres13n2 /tang.htm. [4] C.W. Cleverdon, J. Mills, E.M. Keen, An Inquiry in Testing of Information Retrieval Systems, Granfield, U.K., 1966. [5] J. Gwidzka, M. Chignell, ”Towards Information Retrieval Measures for Evaluation of Web Search Engines”, http://www.imedia. mie.utoronto.ca/people/jacek/pubs/webIR eval1 99.pdf, 1999. [6] D. Rafiei, A.O. Mendelzon, ”What is This Page Known For: Computing Web Page Reputations”, Elsevier Journal on Computer Networks, vol 33, pp. 823-835, 2000. [7] N. Bhatti, A. Bouch, A. Kuchinsky, ”Integrating User-Perceived Quality into Web Server Design”, Elsevier Journal on Computer Networks, vol 33, pp. 1-16, 2000. [8] S. Brin, L. Page, ”The Anatomy of a Large-Scale Hypertextual Web Search Engine”, http://www-db.stanford.edu/pub/papers/google.pdf, 2000. [9] J. Kleinberg, ”Authoritative Sources in a Hyperlinked Environment”, Proceedings of 9th ACM/SIAM Symposium on Discrete Algorithms, 1998. [10] R. Motwani, P. Raghavan, Randomized Algorithms, Cambridge University Press, August 1995.
68
ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC Hiba Khelil, Abdelkader Benyettou SIMPA Laboratory – University of Sciences and Technology of Oran, PB 1505 M’naouer, 31000 Oran, Algeria
[email protected],
[email protected]
ABSTRACT Lately, a lot of new illnesses are frequently observed in our societies, that it can be avoid by daily visits to the doctor. Cancer is one of these illnesses where patients discover it only when it is too late. In this work we propose an artificial Cancer diagnostic which can classify patients if they are affected by Cancer or no, for this goal we have developed the artificial immune system for Cancer diagnostic. The artificial immune system is one of the newest approaches used in several domains as pattern recognition, robotic, intrusion detection, illnesses diagnostic…a lot of methods are exposed as negative selection, clone selection and artificial immune network (AINet). In this paper we’ll present the natural immune system, we’ll develop also four versions of Artificial Immune Recognition System (AIRS) and after we’ll present results for Cancer diagnostic with some critics and remarks of these methods. Keywords: Antigen, Antibody, B memory cells, Artificial Recognition Ball (ARB), Artificial Immune Recognition System (AIRS), Cancer diagnostic.
1
INTRODUCTION
Pattern recognition is very vast domain in the artificial intelligence, where we can find faces, prints, speech and hand writing recognition and others patterns that are not less important as ones mentioned; for this goal several approaches are developed as neuronal networks, evolutionary algorithms, genetic algorithms and others under exploitation. The artificial immune system is a new approach used in different domains as pattern recognition [1] [2] [3] [4] [5], intrusions detection in Internet networks [6], robotics [7], machine learning [8] and other various applications in different domains. The artificial immune functions are inspired from natural immune system where the responsible cells of the immune response are simulated to give an artificial approach adapted according the application domain and the main problem. The present work is an application of the artificial immune recognition system (AIRS) for Cancer diagnostic. AIRS is a method inspired from the biologic immune system for pattern recognition (classification) proposed by A. Watkins in 2001 [9] in his Master thesis to Mississipi university, the improvement was been in 2004 by A. Watkins, J., Timmis and L. Boggess [10] where authors optimize the number of B cells generated. This method is characterized by the distributed training proven by A. Watkins in his PHD to the university of Kent in 2005 [11]. In this paper we will begin by a short definition
UbiCC Journal - Volume 3
of the natural immune system and immune response types. The second part is a representation of an artificial simulation of the immune systems giving a description of training algorithms. As prototype of training we’ll present a preview of Cancer data bases and results of application of artificial immune system for Cancer diagnostic. Finally some critics are given to show limits and differences between methods and some perspectives also. 2
NATURAL IMMUNE SYSTEM
The biologic immune system constitutes a weapon against intruders in a given body, for this goal several cells contribute to eliminate this intruder named antigen, these cells participate for a 'biologic immune response'. We distinguish two types of natural immune response one is innate and other is acquired explained in the following points: 2.1
Innate Immunity It is an elementary immune that very reduce number of antigens are used, we can find this type of immunity in newborn not yet vaccinated. A non adaptive immune for long time can drive infections and death because the body is not very armed against antigens of the environment [12]. 2.2
Innate Acquired It is an immune endowed with a memory, named also secondary response, triggered after the
69
apparition of the same antigen in the same immune system for the second time or more, where it generates the development of B cells memory for this type of antigen already met (memorized) in the system. This answer is faster than innate one [12] and caused the increase of the temperature of the body, which can be explained by fighting of B cells against antigens. The primary immune response is only slower but it keeps information about passage of antigens in the system; this memorization phenomenon will interest us to use it for the artificial pattern recognition; according this principle the artificial immune recognition system is developed and it is the main subject of this paper. 3
THE ARTIFICIAL IMMUNE SYSTEM
The natural immune system is very complicated to be artificially simulated; but A. B. Watkins succeeded to simulate the most important functions of the natural immune system for pattern recognition. The main factors entering in the artificial immune system are antigens, antibodies, B memory cells. We’ll present in the next session training algorithms that puts in work the noted factors (antibodies, B memory cells and antigens). 4
THE AIRS ALGORITHM
The present algorithm is inspired from A. B. Watkins thesis [9] [13] [14] which present the artificial immune recognition system intended for pattern recognition named AIRS. First, antigens represent the training data used in the training program in order to generate antibodies (B cells) to be used in the test step (classification). We can note that there are four training steps in the artificial immune training algorithm as following: 4.1 Initialization Step In this step, all characteristic vectors of antigens are normalized, and affinity threshold is calculated by (1) n−1 n ∑ ∑ affinity(agi , ag j ) i=1 j =i+1 affinity _ threshold = (1) n(n − 1)
2 Noting here that affinity is the Euclidian distance between two antigens, and n is the number of antigens (cardinality of training data) To begin, we must initialize the B cells memory set (MC) and ARBs population by choosing arbitrary examples from training data. 4.2 B Cells Identification and ARBs Generation Ones initialization was finished; this step is executed for each antigen from training data. First,
UbiCC Journal - Volume 3
was chosen from MC which has the least value of affinity (maximize the stimulation) with this antigen, noting that: stimulation ( ag , mc ) = 1- affinity ( ag , mc ) (2) This cell will be conserved for a long time by cloning and generate new ARBs, these ARBs will be add to the old ARBs set; the clone number is calculated by formula (3) clone _ number = hyper _ clonal _ rate*clonal _ rate*stimulation(mcmatch, ag)
(3) Every clone is mutating according a small algorithm described in [9], which consist to alter the characteristic cells vectors. 4.3 Competition for Resources and Development of a Candidate Memory Cell In this step, ARBs’s information was completed by calculating resources allocations in function of its stimulation as following: resource = stimulation ( ag , ARB ( antibody )) * clonal _ rate
(4) and calculate the average stimulation for each ARB also. This step cans death some ARBs which are low stimulated. After we clone and mutate the subset of ARBs according their stimulation level. While the average stimulation value of each ARB class (si) is less then a given stimulation thresholds then we repeat the third step. si =
| A Bi | ∑ a b j .s tim j =1 | ABi |
(5)
, abj ∈ A Bi
s i ≥ a ffin ity _ th r e s h o ld
(6)
4.4 Memory Cell Introduction Select ARBs of the same class as the antigen with highest affinity. If the affinity of this ARB with the antigenic pattern is better than that of
m c m a tch
mc
ca n d id a te to the then add the candidate cell memory cells set. Additionally, if the affinity of m c m a tch
and
m c ca n d id a te
is below the affinity
m c m a tch
threshold, then remove from memory set. So we repeat the second step until all antigens was treated. After the end of training phase, test is executed using the memory cells generated from training step, in order to classify the new antigenic patterns. The critter of classification is to attribute the new antigen to the most appropriate class using KMeans or KNN (K Nearest Neighbor); in this paper we’ll present classification results using KMeans algorithm.
70
5
THE AIRS2 ALGORITHM
7
The changes made to the AIRS algorithm are small, but it offers simplicity of implementation, data reduction and minimizes the processing time. The AIRS2 training steps are the same as AIRS one, just some changes which are presented as following: 1- It’s not necessary to initialize the ARB set. 2- It’s not necessary to mutate the ARBs class feature, because in AIRS2 we are interesting only about cells of the same class of antigen. 3- Resources are only allocated to ARBs of the same class as antigen and are allocated in proportion to the ARB’s stimulation level in reaction to the antigen. 4- The training stopping criterion no longer takes into account the stimulation value of ARBs in all classes, but only accounts for the stimulation value of the ARBs of the same class as the antigen. 6
AIRS AND AIRS2 ALGORITHMS USING MERGING FACTOR
In this session we’ll present other modification of the AIRS and AIRS2. This modification carries on the last training step (Memory cell introduction), mainly in the cell introduction criterion; the condition was as following: C a n d S tim ← S tim u la tio n ( a g , m c ca n d id a te ) M a tch S tim ← S tim u la tio n ( a g , m c m a tch )
(7)
C ellA ff ← a ffin ity ( m c ca n d id a te , m c m a tch ) if ( C a n d S tim > M a tch S tim ) if ( C ellA ff < A T
*
ATS )
M C ← M C − m c m a tch M C ← M C ∪ m c ca n d id a te
This
source
m c ca n d id a te
explains
conditions
to
add
mc
m a tch to the memory and delete cells set; the modification is carried in the following condition: (8) if (CellAff < AT * ATS factor ) + Noting that factor is calculated by:
factor = AT
*
ATS *
dampener
*
log( np)
(9) With ATS and dampener are two parameters between 0 and 1, and np is the number of training programs executed in parallel (number of classes). This change to the merging scheme relaxes the criterion for memory cell removal in the affinity based merging scheme by a small fraction in logarithmic. This modification is used in the two algorithms (AIRS and AIRS2), and all algorithms are applied for Cancer diagnostics and all results we’ll be presented in the next sessions.
UbiCC Journal - Volume 3
RESULTS
To determine the relative performance of AIRSs algorithms, it was necessary to test it on data base; so we have chosen three Cancer data bases from hospitable academic center of Wisconsin: Brest Cancer Wisconsin (BCW), Wisconsin Prognostic Breast Cancer (WPBC) and Wisconsin Diagnostic Breast Cancer (WDBC). The description of this data bases are given as following: - Brest Cancer Wisconsin (BCW): This data base was obtained from hospitable academic center of Wisconsin in 1991, which describe the cancerous symptoms and classify them into two classes: ‘Malignant’ or ‘Benin’. The distribution of patients is given as following: (Malignant, 214) (Benin, 458). - Wisconsin Prognostic Breast Cancer (WPBC): This data base is conceived by the same hospitable academic in 1995 but it gives more details then BCW giving nucleus of cell observations. Basing of its characteristics, patients are classify into two classes: ‘Recur’ and ‘NonRecur’, where its distribution as following: (Recur, 47) (NonRecur, 151). - Wisconsin Diagnostic Breast Cancer (WDBC): This data base is conceived also by the same hospitable academic in 1995, it has the same attribute then WPBC, but it classify its patients into two classes: ‘Malignant’ and ‘Benin’, where its distribution as following: (Malignant, 212) (Benin, 357). All training data are antigens, represented by characteristic vectors; also for antibodies have the same characteristic vector size as antigens. The ARB is represented as structure having antibody characteristic vector, his stimulation with antigen and the resources that allowed. 7.1 Software and Hardware Resources In order to apply algorithms we have used the C++ language in Linux Mandriva 2006 environment, every machine is endowed of 512Mo memory space and 3.0 Ghz processor frequency. All training programs have the same number of antigens, the same number of initial memory cells and ARBs also. In the same way the training program is an iterative process; that we have fix 50 iterations for each training program of every class. 7.2 Results and Classification Accuracy To run programs, we must fix the most important training parameters as following: hyper _ clonal _ rate clonal _ rate , and mutation _ rate . These parameters are used in training steps, as criterion to limit the clone number, to calculate the ARB’s resources and in the mutation procedure also. The parameters values are given in table 1:
71
Table 1: Training parameters.
Parameters
Type
Values
Hyper_clonal_rate
Integer value
30
Clonal_rate
Integer value
20
Mutation_rate
Real value [0,1]
0.1
After 50 iterations, the B memory cells generated from each training program of classes are used in the classification step (test). In the classification we take the shortest distance between the new antigen and the gravity centers of all memory cells sets, and we affect this antigen in the same class of the nearest center (KMeans). Using this principle the classification accuracies are given in table 2:
Table 2: Classification accuracies
Table 3: Average classification accuracies
UbiCC Journal - Volume 3
72
We can observe that AIRS gives in general the best results, which can give more B cells then to increase the recognition chance. The AIRS2 and AIRS2 using factor are amelioration of AIRS, but B cells generated are less than the original algorithm (AIRS), that’s why these methods (AIRS2, AIRS2 with factor) don’t give better results in Cancer diagnostic in exception of some cases; the evolution of B cells is given in the next session. The execution of AIRS2 and AIRS2 using factor are faster than AIRS and AIRS using factor, this can be explained by using just B cells of the same class of antigen, and this can reduce treatments then time processing also.
7.3 B Cells Evolution Noting that we have given the same chance to each training programs (quantity of antigens introduced, initial memory cells and initial ARBs), the B cells generated are not necessary the same in each method; the previews table give us the B cells generated in each one, we can observe that the cells generated in AIRS2 and AIRS2 using factor are less than AIRS and AIRS using factor, as mentioned before, although we initialized all B cells sets to the same size. The next figures represent evolution of B cells in function of iterations for each data base of the best rate from four methods:
Figure 3: Evolution of B cells in WDBC (AIRS2) From figures we observe that evolution of B cells in AIRS are faster than AIRS2 and AIRS2 using factor because there are more cells deleted on changing the condition given in equation (8) (using factor).
8
The results of experiences can be found in table 2 and 3; comparing the used methods we can observe that AIRS in general gives best results and we can observe also that this method generates B cells more than others. In all experiences we have used the Euclidian distance, it is possible to use hamming distance or other. The AIRS and AIRS using factor converge slowly to the most B cells adapted for Cancer diagnostic, on the contrary of AIRS2 and AIRS2 using factor, which are executed quickly than others but they don’t give us the best memory cells.
9
Figure 1: Evolution of B cells in BCW (AIRS)
Figure 2: Evolution of B cells in WPBC (AIRS2 using factor)
UbiCC Journal - Volume 3
DISCUSSION OF RESULTS
CONCLUSIONS
In this paper we have presented Cancer diagnostic results for AIRS immuno-computing algorithm and provided directions for interpretation of these results. We are interested in immunocomputing because is one of the newest directions in bio-inspired machine learning and focused on AIRS, AIRS2, AIRS using factor and AIRS2 using factor (2001-2005) and it can be also used for classification (illnesses diagnostic). We suggest that AIRS is a mature classifier that delivers reasonable results and that is can safety be used for real world classifications tasks. The presented results are good but it must be improved using optimization algorithms. In our future work we want to make more importance to the parameters values and propose a new method to search the best values of these ones in order to across the performance of these algorithms.
73
10 REFERENCES [1] Secker A., Freitas A., Timmis J.: AISEC: An artificial immune system for e-mail classification. In proceedings of the Congress on Evolutionary Computation, pp. 131-139, Canberra. Australia (2003) [2] Lingjun M., Peter V. D. P., Haiyang W., A: Comprehensive benchmark of the artificial immune recognition system (AIRS). In proceeding of advanced data mining and applications ADMA, vol. 3584, pp. 575--582, China (2005) [3] Deneche A., Meshoul S., Batouche M. : Une approche hybride pour la reconnaissance des formes en utilisant un systeme immunitaire artificiel. In proceedings of graphic computer science, Biskra, Algeria (2005) [4] Deneche A.: Approches bios inspirees pour la reconnaissance de formes, Master thesis in Mentouri University, Constantine, Algeria (2006) [5] Goodman D., Boggess L., Watkins A.: Artificial immune system classification of multiple class problems, Intelligent Engineering Systems Through Artificial Neural press (2002) [6] Kim J., Bently P.: Towards an artificial immune system for network intrusion detection: an investigation of clonal selection with a negative selection operator. In Proceeding of Congress on Evolutionary Computation, vol. 2, pp. 1244--1252, South Korea (2001) [7] Jun J. H., Lee D. W., Sim K. B.: Realization of cooperative and swarm behavior in distributed autonomous robotic systems using artificial immune system. In proceeding IEEE international conference of
UbiCC Journal - Volume 3
Man and Cybernetics, vol. 6, pp. 614--619. IEEE Press, New York (1999) [8] Timmis J.: Artificial immune systems: a novel data analysis technique inspired by the immune network theory, PhD thesis Wales UK University (2000) [9] Watkins A.: AIRS: A resource limited artificial immune classifier, Master thesis, Mississippi University (2001) [10] Watkins A., Timmis J., Boggess L.: Artificial immune recognition system (airs): an immune inspired supervised learning algorithm, vol. 5, pp. 291--317, Genetic Programming and Evolvable Machines press (2004) [11] Watkins A.: Exploiting immunological metaphors in the development of serial, parallel, and distributed learning algorithms, PhD thesis, Kent University (2005) [12] Emilie P.: Organisation du system immunitaire felin, PhD thesis, National school Lyon, France (2006) [13] Watkins A., Timmis J.: Artificial immune recognition system (airs): revisions and refinements. In proceedings of first international conference on artificial immune system ICARIS, pp. 173--181, Kent University (2005)
[14] Watkins A., Boggess L.: A new classifier based on resources limited artificial immune systems. In proceedings of congress de Evolutionary Computation, IEEE World Congress on Computational Intelligence held in Honolulu, HI, USA, pp. 1546--1551, Kent University (2005)
74
HOT METHOD PREDICTION USING SUPPORT VECTOR MACHINES Sandra Johnson ,Dr S Valli Department of Computer Science and Engineering, Anna University, Chennai – 600 025, India.
[email protected] ,
[email protected]
ABSTRACT Runtime hot method detection being an important dynamic compiler optimization parameter, has challenged researchers to explore and refine techniques to address the problem of expensive profiling overhead incurred during the process. Although the recent trend has been toward the application of machine learning heuristics in compiler optimization, its role in identification and prediction of hot methods has been ignored. The aim of this work is to develop a model using the machine learning algorithm, the Support Vector Machine (SVM) to identify and predict hot methods in a given program, to which the best set of optimizations could be applied. When trained with ten static program features, the derived model predicts hot methods with an appreciable 62.57% accuracy. Keywords: Machine Learning, Support Vector Machines, Hot Methods, Virtual Machines.
1
INTRODUCTION
Optimizers depend on profile information to identify hot methods of program segments. The major inadequacy associated with the dynamic optimization technique is the high cost of accurate data profiling via program instrumentation. The major challenge is how to minimize the overhead that includes profile collection, optimization strategy selection and re-optimization. While there is a significant amount of work relating to cost effective and performance efficient machine learning (ML) techniques to tune individual optimization heuristics, relatively little work has been done on the identification and prediction of frequently executed program hot spots using machine learning algorithms so as to target the best set of optimizations. In this study it is proposed to develop a machine learning based predictive model using the Support Vector Machine (SVM) classifier. Ten features have been derived from the chosen domain knowledge, for training and testing the classifiers. The training data set are collected from the SPEC CPU2000 INT and UTDSP benchmark programs. The SVM classifier is trained offline with the training data set and it is used in predicting the hot methods of a program which are not trained. This system is evaluated for the program’s hot method prediction accuracy. This paper is structured as follows. Section 2 discusses related work. Section 3 gives a brief overview of Support Vector Machines. In Section 4 this approach and in section 5 the evaluation methodology is described. Section 6 presents the
results of the evaluation. Section 7 proposes future work and concludes the paper. 2
RELATED WORK
Machine learning techniques are currently used to automate the construction of well-designed individual optimization heuristics. In addition, the search is on for automatic detection of a program segment for targeted optimization. While no previous work to the best of our knowledge has used ML for predicting program hot spots, this section reviews the research papers which use ML for compiler optimization heuristics. In a recent review of research on the challenges confronting dynamic compiler optimizers, Arnold et al. [1] give a detailed review of adaptive optimizations used in the virtual machine environment. They conclude that feedback-directed optimization techniques are not well used in production systems. Shun Long et al. [3] have used the Instancebased learning algorithm to identify the best transformations for each program. For each optimized program, a database stores the transformations selected, the program features and the resulting speedup. The aim is to apply appropriate transformations when a new program is encountered. Cavazos et al. [4] have applied an offline ML technique to decide whether to inline a method or not. The adaptive system uses online profile data to identify “hot methods” and method calls in the hot methods are in-lined using the ML heuristics.
Computing and Communication Journal UbiCC Journal - VolumeUbiquitous 3
1
75
Cavazos et al. [5, 12] have also used supervised learning to decide on which optimization algorithm to use: either graph coloring or Linear scan for register allocation. They have used three categories of method level features for ML heuristics (i.e.) features of edges of a control flow graph, features related to live intervals and finally, statistical features about the size of a method. Cavazos et al. [11] report that the best of compiler optimizations is method dependent rather than program dependent. Their paper describes how, logistic regression-based machine learning technique trained using only static features of a method, is used to automatically derive a simple predictive model that selects the best set of optimizations for individual methods within a dynamic compiler. They take into consideration the structures of a particular method within a program to develop a sequence of optimization phases. The automatically constructed regression model is shown to out-perform handtuned models. To identify basic blocks for instruction scheduling Cavazos et al. [20] have used supervised learning. Monsifrot et al. [2] have used a decision tree learning algorithm to identify loops for unrolling. Most of the work [4, 5, 11, 12, 20] is implemented and evaluated using Jikes RVM. The authors [8, 19] have used genetic programming to choose an effective priority function which prioritizes the various compiler options available. They have chosen hyper-block formation, register allocation and data pre-fetching for evaluating their optimizations. Agakov et al. [9] have applied machine learning to speed up search-based iterative optimization. The statistical technique of the Principal component analysis (PCA) is used in their work for appropriate program feature selection. The program features collected off-line from a set of training programs are used for learning by the nearest neighbor algorithm. Features are then extracted for a new program and are processed by the PCA before they are classified, using the nearest neighbor algorithm. This reduces the search space to a few good transformations for the new program from the various available sourcelevel transformations. However, this model can be applied only to whole programs. The authors [10] present a machine learningbased model to predict the performance of a modified program using static source code features and features like execution frequencies of basic blocks which are extracted from the profile data collected. As proposed in their approach [9], the authors have used the PCA to reduce the feature set. A linear regression model and an artificial neural network model are used for building the prediction model which is shown to work better than the nonfeature-based predictors. In their work Fursin et al. [14] have used
machine learning to identify the best procedure clone for the current run of the program. M. Stephenson et al. [18] have used two machine learning algorithms, the nearest neighbor (NN) and Support Vector Machines (SVMs), to predict the loop unroll factor. None of these approaches aims at prediction at the method level. However, machine learning has been widely used in work on branch prediction [21, 22, 23, 24]. 3
SUPPORT VECTOR MACHINES
The SVM [15, 16] classification maps a training data (xi,yi), i = 1,…,n where each instance is a set of feature values xi ∈ Rn and a class label y ∈ {+1,-1}, into a higher-dimensional feature space φ(x) and defines a separating hyperplane. Only two types of data can be separated by the SVM which is a binary classifier. Fig. 1 shows a linear SVM hyperplane separating two classes. The linear separation in the feature space is done using the dot product φ(x).φ(y). Positive definite kernel functions k(x, y) correspond to feature space dot products and are therefore used in the training algorithm instead of the dot product as in Eq. (1):
k ( x, y ) = (Φ ( x) • Φ ( y ))
(1)
The decision function given by the SVM is given in Eq. (2): n
f ( x ) = ∑ vi .k ( x, xi ) + b
(2)
i =1
where b is a bias parameter, x is the training example and vi is the solution to a quadratic optimization problem. The margin of separation extending from the hyperplane gives the solution of the quadratic optimization problem. Optimal Hyperplane
Margin of Separation
Feature Space
Figure 3: Optimal hyperplane and margin of separation 4
HOT METHOD PREDICTION This section briefly describes how machine
Computing and Communication Journal UbiCC Journal - VolumeUbiquitous 3
2
76
learning could be used in developing a model to predict hot methods within a program. A discussion of the approach is followed by the scheme of the SVM-based strategy adopted in this study.
Machine’s (LLVM) [6] bytecode representation of the programs provides the training as well as the test data set. The system architecture for the SVM-based hot method predictive model is shown in Fig.2 and it closely resembles the architecture proposed by the authors C. L. Huang et. al. [26]. Fig. 3 outlines the strategies for building a predictive model. 1.
2.
3.
4.
Create training data set. a. Collect method level features i. Calculate the specified feature for every method in a LLVM bytecode. ii. Store the feature set in a vector. b. Label each method i. Instrument each method in the program with a counter variable [25]. ii. Execute the program and collect the frequency of the execution of each method. iii. Using the profile information, each method is labeled as either hot or cold. iv. Write the label and its corresponding feature vector for every method in a file. c. Steps (a) & (b) are repeated for as many programs as are required for training. Train the predictive model. a. The feature data set is used to train the SVM-based model. b. The predictive model is generated as output. Create test data set. a. Collect method level features. i. Calculate the specified features for every method in a new program. ii. Store the feature set in a vector. iii. Assign the label ‘0’ for each feature vector in a file. Predict the label as either hot or cold for the test data generated in step 3 using the predictive model derived in step 2.
Figure 3: System outline 4.2
Figure 2: System architecture of the SVM-based hot method predictive model 4.1
The approach Static features of each method in a program are collected by offline program analysis. Each of these method level features forms a feature vector which is labeled either hot or cold based on classification by a prior execution of the program. The training data set thus generated is used to train the SVM-based predictive model. Next, the test data set is created by offline program analysis of a newly encountered program. The trained model is used to predict whether a method is hot or cold for the new program. An offline analysis on the Low Level Virtual
Extracting program features The ‘C’ programs used for training are converted into LLVM bytecodes using the LLVM frontend. Every bytecode file is organized into a single module. Each module contains methods which are either userdefined or pre-defined. Only static features of the user-defined methods are extracted from the bytecode module, for the simple reason that they can be easily collected by an offline program analysis. Table 1 lists the 10 static features that are used to train the classifier. Each feature value of a method is calculated in relation to an identical feature value extracted from the entire bytecode module. The collection of all the feature values for a method constitutes the feature vector xi. This feature vector xi is stored for subsequent labeling. Each feature vector xi is then labeled yi and classified as either hot
Computing and Communication Journal UbiCC Journal - VolumeUbiquitous 3
3
77
(+1) or cold (-1) based on an arbitrary threshold scheme described in the next section. Table 1: static features for identifying hot methods. 1.
Number of loops in a method. Average loop depth of all the loops in the 2. method. 3. Number of top level loops in a method. Number of bytecode level instructions in 4. the method. 5. Number of Call instructions in a method. 6. Number of Load instructions in a method. 7. Number of Store instructions in a method. Number of Branch instructions in the 8. method. 9. Number of Basic Blocks in the method. 10. Number of call sites for each method.
4.3
Extracting method execution frequencies Hot methods are frequently executing program segments. To identify hot and cold methods within a training program, profile information is gathered during execution. The training bytecode modules are instrumented with a counter variable in each userdefined method. This instrumented bytecode module is then executed and the execution frequency of each method is collected. Using this profile information, the top ‘N’ most frequently executed methods are classified as hot. This system keeps the value ‘N’ as the “hot method threshold”. In this scheme of classification, each feature vector (xi) is now labeled yi (+1) and yi (-1) for hot and cold methods respectively. The feature vector (xi) along with its label (yi) is then written into a training dataset file. Similarly, the training data set of the different training programs is accumulated in the file. This file is used as an input to train the predictive model. +1 1:1 2:1 3:1 4:0.880046 5:2.51046 6:0.875912 7:0.634249 8:1.23119 9:1.59314 10:29 -1 1:0 2:0 3:0 4:1.16702 5:1.25523 6:1.0219 7:3.38266 8:1.50479 9:1.83824 10:2 +1 1:2 2:2 3:2 4:1.47312 5:0.83682 6:1.89781 7:1.47992 8:2.59918 9:2.81863 10:3 Figure 4: Sample training data set The general format of a feature vector is yi 1:xi1, 2:xi2, 3:xi3, …….j:xij where the labels 1, 2, 3,.... , j are the feature numbers and xi1, xi2, ...., xij are their corresponding feature values. Fig. 4 shows a sample of three feature vectors from the training dataset collected for the userdefined methods found in the SPEC benchmark program. The first feature vector in Fig. 4 is a hot method and is labeled +1. The values of the ten features are serially listed for example ‘1’ is the value of feature 1 and ‘29’ of 10. The value ‘1’ of
feature 1 indicates the percent of loops found in the method. The “hot method threshold” used being 50%, 4 out of the 8 most frequently executed methods in a program are designated as hot methods. The first element in each vector is the label yi (+1 or -1). Each element of the feature vector indicates the feature number followed by the feature values. 4.4
Creating test data set When a new program is encountered, the test data set is collected in a way similar to the training data set, except that the label is specified as zero. 0 1:1 2:1 3:1 4:1.13098 5:2.91262 6:2.05479 7:1.09091 8:1.34875 9:1.55172 10:34 0 1:0 2:0 3:0 4:0.552341 5:0.970874 6:1.14155 7:0.363636 8:0.385356 9:0.862069 10:4 0 1:1 2:1 3:1 4:1.26249 5:0 6:2.51142 7:2.90909 8:1.15607 9:1.2069 10:40 Figure 5: Sample test data set 4.5
Training and prediction using SVM Using the training data set file as input , the machine learning algorithm SVM is trained with default parameters (C-SVM, C=1, radial base function). Once trained the predictive model is generated as output. The derived model is used to predict the label for each feature vector in the test data set file. The training and prediction are done offline. Subsequently, the new program used for creating test data set is instrumented. Executing this instrumented program provides the most frequently executed methods. The prediction accuracy of the system is evaluated by comparing the predicted output with the actual profile values. 5
EVALUATION
5.1
Method Prediction accuracy is defined as the ratio of events correctly predicted to all the events encountered. This prediction accuracy is of two types: hot method prediction accuracy and total prediction accuracy. Hot method prediction accuracy is the ratio of correct hot method predictions to the actual number of hot methods in a program, whereas total prediction accuracy is the ratio of correct predictions (either hot or cold) to the total number of methods in a program. Hot method prediction accuracy is evaluated at three hot method threshold levels: 50%, 40% and 30%. The leave-one-out cross-validation method is used in evaluating this system. This is a standard machine learning technique where ‘n’ benchmark programs are used iteratively for evaluation. One out of the ‘n’ programs is used for testing and the rest ‘n1’ programs are used for training the model. This is repeated for all the ‘n’ programs in the benchmark
Computing and Communication Journal UbiCC Journal - VolumeUbiquitous 3
4
78
suite. Total Method Prediction Accuracy
5.2
Benchmarks Two benchmark suites, SPEC CPU2000 INT [17] and UTDSP [13] have been used for training and prediction. UTDSP is a C benchmark and SPEC CPU2000 INT has C and C++ benchmarks. Evaluation of the system is based on only the C programs of either benchmark. The model trained from the ‘n-1’ benchmark programs in the suite is used to predict the hot methods in the missed out benchmark program.
Hot Method Thresholds
50%
40%
30%
Prediction Accuracy %
120 100 80 60 40 20
16 4. g
zi p 17 6. gc c 18 1. m cf 19 7. pa rs er 25 5. vo r te x 25 6. bz ip 2 30 0. tw ol f Av er ag e
0
5.3
Tools and platform The system is implemented in the Low Level Virtual Machine (LLVM) version 1.6 [6]. LLVM is an open source compiler infrastructure that supports compile time, link time, run time and idle time optimizations. The results are evaluated on an Intel (R) Pentium (R) D with 2.80 GHz and 480MB of RAM running Fedora Core 4. This system uses the libSVM tool [7]. It is a simple library for Support Vector Machines written in C. Hot Method Prediction Accuracy Hot Method Thresholds
50%
40%
30%
Prediction Accuracy %
120 100 80 60 40
Figure 7: Total prediction accuracy on the SPEC CPU2000 INT benchmark The total method prediction accuracy on the SPEC CPU2000 INT and UTDSP benchmark suites is shown in Fig. 7 and 9. The total method prediction accuracy for all C programs on the SPEC CPU2000 INT varies from 36 % to 100 % with an average of 68.43%, 71.14% and 71.14% for the three hot method thresholds respectively. This averages to 70.24%. The average prediction accuracies obtained on the UTDSP benchmark suite are 69%, 71% and 58% respectively for 50%, 40% and 30% hot method thresholds. This averages to 66%. Overall the system predicts both hot and cold methods in a program with 68.15% accuracy. 7
CONCLUSION AND FUTURE WORK
20
16 4. g
zi p 17 6. gc c 18 1. m cf 19 7. pa rs er 25 5. vo rte x 25 6. bz ip 2 30 0. tw ol f Av er ag e
0
SPEC CPU2000 INT
Figure 6: Hot method prediction accuracy on the SPEC CPU2000 INT benchmark 6
SPEC CPU2000 INT
RESULTS
Fig. 6 shows the prediction accuracy of the trained model on the SPEC CPU2000 INT benchmark program at three different hot method thresholds: 50%, 40% and 30%. The hot method prediction accuracy for all C programs on the benchmark is found to vary from 0 % to 100 % with an average of 57.86 %, 51.43% and 39.14% for the three hot method thresholds respectively. This averages to 49.48% on the SPEC CPU2000 INT benchmark suite. Similarly, on the UTDSP benchmark suite, in a 0% to 100% range, the hot method prediction accuracy averages for the three thresholds are 84%, 81% and 62% respectively. This averages to 76% on the UTDSP benchmark suite. Overall, this new system can obtain 62.57% hot method prediction accuracy.
Optimizers depend on profile information to identify the hot methods of program segments. The major inadequacy associated with the dynamic optimization technique is the high cost of accurate data profiling via program instrumentation. In this work, a method is worked out to identify hot methods in a program using the machine learning algorithm, the SVM. According to our study, with a set of ten static features used in training the system, the derived model predicts total methods within a program with 68.15% accuracy and hot methods with 62.57% accuracy. However, hot method prediction is of greater value because optimizations will be more effective in these methods. Future work in this area is aimed at improving the prediction accuracy of the system by identifying more effective static and dynamic features of a program. Further research in this system can be extended to enhance it to a dynamic hot method prediction system which can be used by dynamic optimizers. Applying this approach, the prediction accuracy of the other machine learning algorithms can be evaluated to build additional models.
Computing and Communication Journal UbiCC Journal - VolumeUbiquitous 3
5
79
Hot Method Prediction Accuracy 50%
40%
30%
100 80 60 40 20
ra
ge
ul t m
Av e m ul t
sf ir lm
nr m
lm sf ir
iir
la t
fi r
ff t
lp c ec tra l tr e V3 l li s 2. m od em sp
m
eg jp
2
og ra
G 72
72
hi st
ar cu
te ct
ss
UTDSP benchmark
G
G
72
1. M
de
pr e ed
co
ge _
m
ad
1. sL W ee en dy Fu ng
0
pc m
Prediction Accuracy %
Hot Method Thresholds
Figure 8: Hot method prediction accuracy on the UTDSP benchmark. Total Prediction Accuracy Hot Method Thresholds
50%
40%
30%
Prediction Accuracy %
100 80 60 40 20
er ag e Av
i ir la tn rm
fir
ff t
tre llis V3 2. m od em
ra l ec t sp
lp c
jp eg
2
og ra m
G 72
hi st
te 1. ct M ar G cu 72 s L 1. ee W en dy Fu ng
ss
_d e
G 72
ed ge
m pr e co
ad pc
m
0
UTDSP benchmark
Figure 9: Total Prediction Accuracy on the UTDSP benchmark.
8
REFERENCES
[1] Matthew Arnold, Stephen Fink, David Grove, Michael Hind, and Peter F. Sweeney: A Survey of Adaptive Optimization in Virtual Machines, Proceedings of the IEEE, pp. 449-466, February 2005. [2] A. Monsifrot, F. Bodin, and R. Quiniou: A machine learning approach to automatic production of compiler heuristics, In Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, Applications, LNCS 2443, pp. 41-50, 2002. [3] S. Long and M. O'Boyle: Adaptive java optimization using instance-based learning, In ACM International Conference on Supercomputing (ICS'04), pp. 237-246, June 2004. [4] John Cavazos and Michael F.P. O'Boyle: Automatic Tuning of Inlining Heuristics, 11th International Workshop on Compilers for Parallel Computers (CPC 2006), January 2006. [5] John Cavazos, J. Eliot B. Moss, and Michael F.P. O'Boyle: Hybrid Optimizations: Which Optimization Algorithm to Use?, 15th
International Conference on Compiler Construction (CC 2006), 2006. [6] C. Lattner and V. Adve: LLVM: A compilation framework for lifelong program analysis & transformation, In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04), March 2004. [7] Chih-Chung Chang and Chih-Jen Lin: LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [8] M. Stephenson, S. Amarasinghe, M. Martin, and U. M. O’Reilly: Meta optimization: Improving compiler heuristics with machine learning, In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03), pp. 77–90, June 2003. [9] F Agakov, E Bonilla, J Cavazos, G Fursin, B Franke, M.F.P. O'Boyle, M Toussant, J Thomson, C Williams: Using machine learning to focus iterative optimization, In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 295305, 2006. [10] Christophe Dubach, John Cavazos, Björn
Computing and Communication Journal UbiCC Journal - VolumeUbiquitous 3
6
80
Franke, Grigori Fursin, Michael O'Boyle and Oliver Temam: Fast compiler optimization evaluation via code-features based performance predictor, In Proceedings of the ACM International Conference on Computing Frontiers, May 2007. [11] John Cavazos, Michael O'Boyle: MethodSpecific Dynamic Compilation using Logistic Regression, ACM Conference on ObjectOriented Programming, Systems, Languages, and Applications (OOPSLA), Portland, Oregon, October 22-26, 2006. [12] John Cavazos: Automatically Constructing Compiler Optimization Heuristics using Supervised Learning, Ph.D thesis, Dept. of Computer Science, University of Massachusetts, 2004. [13] C. Lee: UTDSP benchmark suite. In http://www.eecg.toronto.edu/~corinna/DSP/infr astructure/UTDSP.html, 1998. [14] G. Fursin, C. Miranda, S. Pop, A. Cohen, and O. Temam: Practical run-time adaptation with procedure cloning to enable continuous collective compilation, In Proceedings of the 5th GCC Developer’s Summit, Ottawa, Canada, July 2007. [15] Vapnik, V.N.: The support vector method of function estimation, In Generalization in Neural Network and Machine Learning, Springer-Verlag, pp.239-268, 1999. [16] S. Kotsiantis: Supervised Machine Learning: A Review of Classification Techniques, Informatica Journal 31, pp. 249-268, 2007. [17] The Standard Performance Evaluation Corporation. http://www.specbench.org. [18] M. Stephenson and S.P. Amarasinghe: Predicting unroll factors using supervised classification, In Proceedings of International Symposium on Code Generation and Optimization (CGO), pp. 123-134, 2005. [19] www.cag.csail.mit.edu
/~mstephen/stephenson_phdthesis.pdf , M. W. Stephenson, Automating the Construction of Compiler Heuristics Using Machine Learning, PhD thesis, MIT, USA, 2006. [20] J. Cavazos and J. Moss: Inducing heuristics to decide whether to schedule, In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2004. [21] B.Calder, D.Grunwald, Michael Jones, D.Lindsay, J.Martin, M.Mozer, and B.Zorn: Evidence-Based Static Branch Prediction Using Machine Learning, In ACM Transactions on Programming Languages and Systems (ToPLaS-19), Vol. 19, 1997. [22] Daniel A. Jiménez , Calvin Lin: Neural methods for dynamic branch prediction, ACM Transactions on Computer Systems (TOCS), Vol. 20 n.4, pp.369-397, November 2002. [23] Jeremy Singer, Gavin Brown and Ian Watson: Branch Prediction with Bayesian Networks, In Proceedings of the First Workshop on Statistical and Machine learning approaches applied to Architectures and compilation, pp. 96-112, Jan 2007. [24] Culpepper B., Gondre M.: SVMs for Improved Branch Prediction, University of California, UCDavis, USA, ECS201A Technical Report, 2005. [25] Youfeng Wu, Larus. J. R. : Static branch frequency and program profile analysis, 1994. MICRO-27, Proceedings of the 27th Annual International Symposium on Microarchitecture, pp: 1 – 11, 1994. [26] C.-L. Huang and C.-J. Wang: A GA-based feature selection and parameters optimization for support vector machines, Expert Systems with Applications, Vol. 31, Issue 2, pp: 231240, 2006.
Computing and Communication Journal UbiCC Journal - VolumeUbiquitous 3
7
81
PROPAGATION MODEL FOR HIGHWAY IN MOBILE COMMUNICATION SYSTEM
K.Ayyappan Department of Electronics and Communication Engineering, Rajiv Gandhi College of Engineering and Technology, Pondicherry, India.
*P. Dananjayan Department of Electronics and Communication Engineering Pondicherry Engineering College Pondicherry - 605014, India. *
[email protected] *corresponding author
ABSTRACT
Radio propagation is essential for emerging technologies with appropriate design, deployment and management strategies for any wireless network. It is heavily site specific and can vary significantly depending on terrain, frequency of operation, velocity of mobile terminal, interface sources and other dynamic factor. Accurate characterization of radio channel through key parameters and a mathematical model is important for predicting signal coverage, achievable data rates, specific performance attributes of alternative signaling and reception schemes. Path loss models for macro cells such as Hata Okumura, COST 231 and ECC 33 models are analyzed and compared their parameters. The received signal strength was calculated with respect to distance and model that can be adopted to minimize the number of handoffs and avoid ping pong effect are determined. This paper proposes a propagation model for highway environment between Pondicherry Villupuram which is 40 kilometers spaced out .Comparative study with real time measurement obtained from Bharat Sanchar Nigam Limited (BSNL) a GSM based wireless network for Pondicherry, India has been implemented. Keywords: Handoff, Path loss, Received signal strength, ping pong, cellular mobile.
1
INTRODUCTION
Propagation models have traditionally focused on predicting the received signal strength at a given distance from the transmitter, as well as the variability of the signal strength in a close spatial proximity to a particular location. Propagation models that predict the signal strength for an arbitrary transmitterreceiver (T-R) separation distance are useful in estimating the radio coverage area of a transmitter. Conversely, propagation models that characterize the rapid fluctuations of the received signal strength over very short travel distances are called small-scale or fading models. Propagation models are useful for predicting signal attenuation or path loss. This path loss information may be used as a controlling factor for system performance or coverage so as to achieve perfect reception [1]. The common approaches to propagation modeling include physical models and empirical models. In this paper, only empirical models are considered. Empirical models use measurement data
UbiCC Journal - Volume 3
to model a path loss equation. To conceive these models, a correlation was found between the received signal strength and other parameters such as antenna heights, terrain profiles, etc through the use of extensive measurement and statistical analysis. Radio transmission in a mobile communication system often takes place over irregular terrain. The terrain profile of a particular area needs to be taken into account for estimating the path loss. The terrain profile may vary from a simple curved earth profile to a highly curved mountainous profile. A number of propagation models are available to predict path loss over irregular terrain. While all these models aim to predict signal strength at a particular receiving point or in a specific location are called sector, the methods vary widely in their approach, complexity and accuracy. Most of these models are based on a systematic interpretation of measurement data obtained in the service area. In cellular mobile communication systems, handoff takes place due to movement of mobile unit and unfavorable conditions inside an individual cell or between a numbers of
1
82
adjacent cells [2, 3]. It is a seamless service to active users while data transfer is in progress, so unnecessary handoffs should be avoided. Hard handoff suffers from ‘ping pong’ effect when the mobile users are near the boundaries of adjacent cells and is a result of frequent handoffs. The parameters measured to determine handoff are usually the received signal strength, the signal to noise ratio and the bit error rate. However, a path loss model can increase the connection reliability. Hence, the choice of path loss model plays an important role in the performance of handoffs. In this paper different path loss models for macro cells such as Hata Okumura model, Cost 231 model and ECC 33 model are analyzed and compared their parameters. A propagation model for highway is proposed by modifying Cost 231 and Hata Okumura suburban model and it is implemented in Pondicherry – Villupuram highway and compared its parameters with experimental values. The work is organized as follows. Section 2 describes the path loss models. Section 3 deals with the received signal strength for different path loss models. Section 4 discusses the models and the results are evaluated. Section 5 concludes with the performance of various path loss models. 2
PATH LOSS MODELS
Path loss is the reduction in power of an electromagnetic wave as it propagates through space. It is a major component in analysis and design of link budget of a communication system. It depends on frequency, antenna height, receive terminal location relative to obstacles and reflectors, and link distance, among many other factors. Macro cells are generally large, providing a coverage range in kilometers and used for outdoor communication. Several empirical path loss models have been determined for macro cells. Among numerous propagation models, the following are the most significant ones, providing the foundation of mobile communication services. The empirical models are i. Hata Okumura model ii. COST 231 model iii. ECC 33 model These prediction models are based on extensive experimental data and statistical analysis, which enable us to compute the received signal level in a given propagation medium [5, 6]. The usage and accuracy of these prediction models depends on the propagation environment.
2.1 Free Space Propagation Model In radio wave propagation models, the free space model predicts that received power decays as a function of T-R separation distance. This implies that
UbiCC Journal - Volume 3
received power decays with distance at a rate of 20 dB/decade. The path loss for free space model when antenna gains are included is given by PL( dB) = -Gt - Gr +32.44 +20log ( d) + 20 log ( f ) (1) where Gt is the transmitted antenna gain in dB, Gr is the received antenna gain in dB, d is the T-R separation distance in kilometers and f is the frequency in MHz. 2.2
The Hata Okumura model
The Hata-Okumura model is an empirical formula for graphical path loss data provided by Yoshihisa Okumura, and is valid from 150 MHz to 1500 MHz. The Hata model is a set of equations based on measurements and extrapolations from curves derived by Okumura. Hata presented the urban area propagation loss as a standard formula, along with additional correction factors for application in other situations such as suburban, rural among others. The computation time is short and only four parameters are required in Hata model [7]. However, the model neglects terrain profile between transmitter and receiver, i.e. hills or other obstacles between transmitter and receiver are not considered. This is because both Hata and Okumura made the assumption that transmitter would normally be located on hills. The path loss in dB for the urban environment is given by PL ( dB ) = A + B log ( d ) (2) where d is distance in kilometer, A represents a fixed loss that depends on frequency of the signal. These parameters are given by the empirical formula
( )
A = 69.55 + 26.16 log ( f ) -13.82 log h b - a ( h m )
( )
B = 44.9 - 6.55 log h b
where, f is frequency measured in MHz, hb is height of the base station antenna in meters, hm is mobile antenna height in meters and a (hm) is correction factor in dB For effective mobile antenna height a (hm) is given by
a [ hm ] = 1.1 log ( f ) - 0.7 hm - 1.56log ( f ) - 0.8 The path loss model for highway is given by For without noise factor 2 f PL ( dB) = PL ( dB)urban - 2 log - 5.4 28
(3)
For with noise factor
2
83
PL ( dB ) = PL ( dB )urban - 2 log
f
2 (4)
28
3
2.3 COST-231 Hata model To extend Hata-Okumura- model for personal communication system (PCS) applications operating at 1800 to 2000 MHz, the European Co-operative for Scientific and Technical Research (COST) came up with COST-231 model. This model is derived from Hata model and depends upon four parameters for prediction of propagation loss: frequency, height of received antenna, height of base station and distance between base station and received antenna [8]. The path loss in urban area is given by
( ) - a ( hm ) + 44.9 - 6.55 log ( hb ) log ( d)
PL ( dB) = 46.33+ 33.9 log ( f ) -13.82 log hb
(5)
where a ( hm ) = 1.1 log ( f ) -0.7 hm - 1.56 log ( f ) -0.8 The path loss calculation for highway is similar to Hata-Okumura model.
2.4 ECC-33 model The ECC 33 path loss model, which is developed by Electronic Communication Committee (ECC), is extrapolated from original measurements by Okumura and modified its assumptions so that it more closely represents a fixed wireless access (FWA) system. The path loss model is defined as,
PL ( dB ) = A fs + A bm - G t - G r
(6)
where, Afs is free space attenuation, Abm is basic median path loss, Gt is BS height gain factor and Gr is received antenna height gain factor. They are individually defined as, Afs = 92.4 + 20 log ( d) + log ( f ) and
Abm = 20.41+9.83log( d) +7.894log( f ) +9.56 log( f ) h 2 Gt = log b 13.958+ 5.8 log( d) 200
{
RECEIVED SIGNAL STRENGTH
In mobile communication, received signal strength is a measurement of power present in a received radio signal. Signal strength between base station and mobile must be greater than threshold value to maintain signal quality at receiver [9]. Simultaneously signal strength must not be too strong to create more co-channel interference with channels in another cell using same frequency band. Handoff decision is based on received signal strength from current base station to neighbouring base stations. The signal gets weaker as mobile moves far away from base station and gets stronger as it gets closer. The received signal strength for various path loss models like Hata Okumura model, Cost 231 model, and ECC-33 model are calculated as Pr= Pt +Gt+Gr-PL - A where, Pr is Received signal strength in dBm, Pt is transmitted power in dBm, Gt is transmitted antenna gain in dB, Gr is received antenna gain in dB, PL is total path loss in dB and A is connector and cable loss in dB. 4
2
}
(7)
PERFORMANCE ANALYSIS Table 1: Simulation parameter Parameters Base station transmitter power
Values 43dBm
Mobile transmitter power
30dBm
Base station antenna height
35m
Mobile antenna height
1.5m
Transmitter antenna gain
for medium city environments, G r = 42.57 +13.7 log ( f ) log ( h m )-0.585 where, f is frequency in GHz, The performance analysis is based on the calculation of received signal strength, path loss between the base station and mobile from the propagation model. The GSM based cellular
UbiCC Journal - Volume 3
d is distance between base station and mobile (km) hb is BS antenna height in meters and hm is mobile antenna height in meters.
17.5dB
Threshold level for mobile
-102dBm
Threshold level for base station
-110dBm
Frequency
900 MHz
Connector loss
2 dB
Cable loss
1.5dB
Duplexer loss
1.5dB
Maximum uplink loss
161.5dB
Maximum downlink loss
161.8dB
network specification obtained from Bharat Sanchar Nigam Limited (BSNL), India shown in Table 1 is used for evaluating the performance of path loss models.
3
84
4.1 Path losses for various models
Path loss (dB)
The number of handoffs per call is relative to cell size, smaller the cell size maximum the number of handoff and the path loss model which cover maximum distance will minimize the number of handoff. The path loss calculation using HataOkumura and COST 231 model are less than the threshold value up to 19 km and ECC 33 model exceed the threshold value at 5km.
Fig.1 Pondicherry- Villupuram highway map
Table 2 Path loss for various models Distance (km)
COST 231(dB)
0.5
104.7
HATA OKUMURA (dB) 105.1
1
115.2
115.5
139.3
1.5
121.3
121.7
144.7
2
125.6
126.0
148.6
2.5
129.0
129.4
151.8
3
131.7
132.1
154.5
3.5
134.1
134.5
156.8
4
136.1
136.5
158.8
ECC33 (dB)
Fig.2 Comparison of Path loss
130.7
The path losses for various models are calculated using eq (2),(5) and (6) between Pondicherry-Villupuram highway as shown in Fig.1 which is connected via Villianur, Kandamangalam, Madagadipet and Valavanur. The circle shown in this figure are the base station transceiver (BTS) and identified by BTS number.The path loss values are calculated and given in Table 2 and the comparison for various models are given in Fig.2.The maximum allowable uplink loss for Nortel S8K base station transceiver is 161.5 dB and maximum allowable downlink loss is 161.8 dB.
UbiCC Journal - Volume 3
Distance (km)
4.2 Received signal strength for various models Table 3: BS to MS received power Distance (km)
COST 231 (dBm)
HATA OKUMURA (dBm)
ECC 33 (dBm)
0.5
-49.2
-49.6
-75.2
1
-59.7
-60.0
-83.8
1.5
-65.9
-66.1
-89.2
2
-70.1
-70.5
-93.2
2.5
-73.5
-73.9
-96.3
3
-76.2
-76.6
-99.0
3.5
-78.6
-79.0
-101.3
4
-80.6
-81.0
-103.3
4 85
Received signal strength (dBm)
Received signal strength (dBm)
Distance (km)
Distance (km)
Fig.3 Received signal strength for suburban models The received signal strength for COST 231, Hata-Okumura and ECC 33 models are calculated using eq (7) shown in Table.3 for suburban area and the comparison shown in Fig.3. The received signal strength using ECC 33 model is -103.3 dBm at four kilometers which is greater than threshold level of mobile receiver -102 dBm. The received signal strength using COST 231 and HataOkumura model values are less than the sensitivity threshold of mobile. So these two models are preferred for maximum coverage area and reduce the number of handoff.
4.3 Highway propagation model
Fig. 4 Received signal strength for highway models The general area around highway could be suburban because of location. The path loss calculation is a major concern in highways with and without noise level. In this paper the suburban model is modified to highway with small correction factor with respect to the location. RSS value for highway without noise factor was calculated using suburban model but highway with noise factor was calculated using additional correction factor of 5.4 with suburban model. Here up to 2.5 km the received signal strength was calculated using suburban model and beyond 2.5 km it was calculated with additional noise factor of 5.4.
Table 4: BS to MS received power Distance (km)
COST 231 (dBm)
HATA OKUMURA (dBm)
Experiment Value (dBm)
0.5
-49.2
-49.6
-50
1
-59.7
-60.0
-57
1.5
-65.9
-66.1
-65
2
-70.1
-70.5
-68
2.5
-73.5
-73.9
-74
3
-81.6
-82.0
-82
3.5
-84.0
-84.4
-84
4
-86.0
-86.4
-86 Fig.5 Adjacent cell RSSI
UbiCC Journal - Volume 3
5 86
The Agilent Technologies provides Drive Test solution for GSM networks. The Agilent Drive Test E6474A model is used to calculate the amount of signal received by a mobile in Pondicherry Villupuram highway as shown in Table 4.The received signal strength for COST 231, HataOkumura models are calculated and compared with experimental value are shown in Fig 4. The modified COST 231, Hata-Okumura suburban models for highway are matches with the experimental values. High RSS handover margins can result in poor reception and dropped calls, while very low values of handover margin can produce ping pong effects as mobile switches frequently between cells. The existing cell broadcast control channel (BCCH-74) and adjacent cells (BCCH-76, 66, 68,69and 78) received signal strengths are shown in Fig.5. The optimum handover decision taken from drive test adjacent cell received signal strength indicator (RSSI) as shown in Fig.5.For BCCH-74 the corresponding base station identification code (BSIC) is 41 and the received signal strength is -72 dBm. Consecutively, mobile user receives adjacent inter and intra cell received signal strengths from BSIC 44,46and 41. The optimum handover decision taken from the drive test adjacent cell received signal strength indicator (RSSI) to improve the signal reception and reduce the number of dropped calls and also the ping-pong effect
5
CONCLUSION
In this paper, different path loss models for macro cells were used. The calculated path loss is compared with existing model like Hata-Okumura model, COST 231model and ECC 33 model. The received signal strength from base stations was calculated and the calculated values are compared with observed value between Pondicherry and Villupuram highway. The result shows that modified suburban model for highway using HataOkumura and COST 231 model is closer to the observed received signal strength and predicted to be suitable model for highway received signal strength calculation.
[2] Tomar G.S and Verma. S, “Analysis of handoff initiation using different path loss models in mobile communication system”, Proceedings of IEEE International Conference on Wireless and Optical Communications Networks, Bangalore, India, Vol. 4, May 2006. [3] Ken-Ichi, Itoh, Soichi Watanche, Jen-Shew Shih and Takuso safo, “Performance of handoff Algorithm Based on Distance and RSS measurements”, IEEE Transactions on vehicular Technology, Vol. 57, No.6, pp 1460-1468, November 2002. [4]
Abhayawardhana V.S, Wassell.I.J, Crosby D, Sellars. M.P. and Brown. M.G, “Comparison of empirical propagation path loss models for fixed wireless access systems”, Proceedings of IEEE Conference on Vehicular Technology , Stockholm, Sweden, Vol. 1, pp 73-77, June 2005.
[5] Maitham Al-Safwani and Asrar U.H. Sheikh, “Signal Strength Measurement at VHF in the Eastern Region of Saudi Arabia”, The Arabian Journal for Science and Engineering, Vol. 28, No.2C, pp.3 -18, December 2003. [6] S. Hemani and M. Oussalah, “Mobile Location System Using Netmonitor and MapPoint server”, Proceedings of Sixth annual Post graduate Symposium on the Convergence of Telecommunication, Networking and Broadcasting, PGNet ,pp.17-22,2006. [7] T.S. Rappaport, “Wireless Communications”, Pearson Education, 2003. [8] William C.Y. Lee, “Mobile Cellular Telecommunications”, McGraw Hill International Editions, 1995. [9] Ahmed H.Zahram, Ben Liang and Aladdin Dalch, “Signal threshold adaptation for vertical handoff on heterogeneous wireless networks”, Mobile Networks and application, Vol.11, No.4, pp 625640, August 2006.
REFERENCES [1] Armoogum.V, Soyjaudah.K.M.S, Fogarty.T and Mohamudally.N, “Comparative Study of Path Loss using Existing Models for Digital Television Broadcasting for Summer Season in the North of Mauritius”, Proceedings of Third Advanced IEEE International Conference on Telecommunication, Mauritius Vol. 4, pp 3438, May 2007.
UbiCC Journal - Volume 3
6 87
A Hardware implementation of Winograd Fourier Transform algorithm for Cryptography G.A.Sathishkumar and 2Dr.K.Boopathy bagan Assistant Professor, Department of Electronics and Communication Engineering, Sri Venkateswara College of Engineering, Sriperumbudur -602108. 2 Professor, Department of Electronics, Madras Institute of Technology, Anna University Chrompet Campus, Chennai-600044 Tamil Nadu. INDIA
[email protected] ,
[email protected] 1
1
ABSTRACT This paper presents a hardware implementation of efficient algorithms that uses the mathematical framework. The framework based on the Winograd’s Fourier Transform Algorithm, obtaining a set of formulations that simplify cyclic convolution (CC) computations and CRT. In particularly, this work focuses on the arithmetic complexity of a multiplication and when there is multiplication then the product represents a CC computational operation. The proposed algorithms is compared against existing algorithms developed making use of the FFT and it is shown that the proposed algorithms exhibit an advantage in computational efficiency .This design is most useful when dealing with large integers, and is required by many modern cryptographic systems. The Winograd Fourier Transform Algorithm (WFTA) is a technique that combines the Rader’s index mapping and Winograd’s short convolution modules for prime-factors into a composite-N Fourier Transform structure with fewer multipliers (O(N) ). While theoretically interesting, WFTA’s are very complicated and different for every length. It can be implemented on modern processors with few hardware multipliers and hence, is very useful in practice today.
Keywords: Discrete Fourier Transform, Fast Fourier Transform, Winograd’s Theorem, Chinese Remainder Theorem. INTRODUCTION Many popular crypto-systems like the RSA encryption scheme [12], the Diffie-Hellman (DH) key agreement scheme [13], or the Digital Signature Algorithm (DSA) [14] are based on long integer modular exponentiation. A major difference between the RSA scheme and cryptosystems based on the discrete logarithm problem is the fact that the modulus used in the RSA encryption scheme is the product of two prime numbers. This allows utilizing the Chinese Remainder Theorem (CRT) in order to speed up the private key operations. From a mathematical point of view, the usage of the CRT for RSA decryption is well known. However, for a hardware implementation, special multiplier architecture is necessary to meet the requirements for efficient CRT-based decryption. This paper presents the basic algorithmic and architectural concepts of the WFTA crypto chip, and describes how they were combined to provide optimum performance. The major design goal with the
UbiCC Journal - Volume 3
WFTA was the maximization of performance on several levels, including the implemented hardware algorithms. In digital signal processing, the design of fast and computationally efficient algorithms has been a major focus of research activity. The objective, in most cases, is the design of algorithms and their respective implementation in a manner that perform the required computations in the least amount of time. In order to achieve this goal, parallel processing has also received a lot attention in the research community [1]. This document is organized as follows. First, mathematical foundations needed for the study of algorithms to compute the DFT and FFT algorithm are summarized. Second, identification is established between Winograd Fourier Transform and the Rader’s algorithm. Third, the algorithm development for the basic problem of the multiplication, using the conceptual framework developed in the two previous sections, is explained. The
88
section also presents several signal flow diagrams that may be implemented in diverse architectures by means of very large scale integration (VLSI) or very high-speed integrated circuits hardware description language (VHDL). Conclusions, contributions, and future development of the present work are then summarized. 1. DISCRETE FOURIER TRANSFORM (DFT) 1.1 DEFINITION The discrete Fourier transform (DFT) is a powerful reversible mapping transform for discrete data sequences with mathematical properties analogous to those of the Fourier transform and it transforms a function from time domain to frequency domain. For length n input vector x, the DFT is a length n vector X, with n elements:
(1) A simple description of these equations is that the complex numbers Fj represent the amplitude and phase of the different sinusoidal components of the input "signal" xk. The DFT computes the Fj from the xk, while the IDFT shows how to compute the xn as a sum of sinusoidal components Fjexp(2πikn/N) / N with frequency (k/N) cycles per sample. By writing the equations in this form, we are making extensive use of Euler's formula to express sinusoids in terms of complex exponentials, which are much easier to manipulate. The number of multiplication and addition operations required by the Discrete Fourier Transform (DFT) is of order N2 as there are N data points to calculate, each of which requires N arithmetic operations. To be exact, the input, if complex, would contain 2 terms and every exponential term would contain 2 terms. So, this would quadruple the computational complexity, thus number of multiplications is 4N2. Hence, for real inputs the required number of multiplications will be 2N2. A fast Fourier transform (FFT) [2] is an efficient algorithm to compute the discrete Fourier transform (DFT) and it’s inverse. FFT’s are of great importance to a wide variety of applications, from digital signal processing and solving partial differential equations to algorithms for quick multiplication of large integers. By far the most common FFT is the Cooley-Tukey algorithm. This is a divide and conquer algorithm that recursively breaks down a DFT of any composite size N = N1N2
UbiCC Journal - Volume 3
into many smaller DFT’s of sizes N1 and N2, along with O(N) multiplications by complex roots of unity traditionally called twiddle factors. The most well-known use of the Cooley-Tukey algorithm is to divide the transform into two pieces of size N / 2 at each step, and is therefore limited to power-of-two sizes, but any factorization can be used in general (as was known to both Gauss and Cooley/Tukey). These are called the radix-2 and mixed-radix cases, respectively (and other variants such as the split-radix FFT have their own names as well). Although the basic idea is recursive, most traditional implementations rearrange the algorithm to avoid explicit recursion. In addition, because the CooleyTukey algorithm breaks the DFT into smaller DFTs, it can be combined arbitrarily with any other algorithm for the DFT. 1.2 Multiplication of large integers The fastest known algorithms [1, 8, and 10] for the multiplication of very large integers use the polynomial multiplication method. Integers can be treated as the value of a polynomial evaluated specifically at the number base, with the coefficients of the polynomial corresponding to the digits in that base. After polynomial multiplication, a relatively low-complexity carry-propagation step completes the multiplication.
2. WINOGRAD FOURIER TRANSFORM ALGORITHM (WFTA) The Winograd Fourier Transform Algorithm (WFTA)[7] is a technique that combines the Rader’s index mapping and Winograd’s short convolution algorithm for prime-factors into a composite-N Fourier Transform structure with fewer multipliers. The Winograd algorithm, factorizes zN − 1 into cyclotomic polynomials—these often have coefficients of 1, 0, or −1, and therefore require few (if any) multiplications, so Winograd can be used to obtain minimalmultiplication FFTs and is often used to find efficient algorithms for small factors. Indeed, Winograd showed that the DFT can be computed with only O(N) irrational multiplications, leading to a proven achievable lower bound on the number of multiplications for power-of-two sizes; unfortunately, this comes at the cost of many more additions, a tradeoff no longer favorable on modern processors with hardware multipliers. In particular, Winograd also makes use of the
89
PFA as well as an algorithm by Rader for FFTs of prime sizes. 2.1 RADER’S ALGORITHM
above equation is a cyclic convolution i.e.,
Rader's algorithm (1968)[7,10] is a fast Fourier transform (FFT) algorithm that computes the discrete Fourier transform (DFT) of prime sizes by re-expressing the DFT as a cyclic convolution. Since Rader's algorithm only depends upon the periodicity of the DFT kernel, it is directly applicable to any other transform (of prime order) with a similar property, such as a number-theoretic transform or the discrete Hartley transform. The algorithm can be modified to gain a factor of two savings for the case of DFTs of real data, using a slightly modified reindexing/permutation to obtain two half-size cyclic convolutions of real data; an alternative adaptation for DFTs of real data, using the discrete Hartley transform, was described by Johnson. Winograd extended Rader's algorithm to include prime-power DFT sizes pm, and today Rader's algorithm is sometimes described as a special case of Winograd's FFT algorithm, also called the multiplicative Fourier transform algorithm, which applies to an even larger class of sizes. 2.2 Algorithm The Rader algorithm to compute the DFT, N−1
nk
X(K) = ∑x(n)W k, n∈ZN;ord(W) = N (2) N
N
n=0
is defined for prime length N. We first compute the DC component with N −1
x (0 ) =
∑
x (n )
for k=0,…,N-2. We notice that right side of the
(3 )
n = 0
Because N=p is a prime, it is known that there is primitive element, a generator ‘g’, that generates all the elements of n and k in the field ZN We substitute n with gk mod N and get the following index transform:
n+k mod(N−1)
N−2
X[gk mod N]− x(0) = ∑x[gn mod N]
W
(4)
N
n=0
x[g0 mod N], x[g1mod N],...x[gN−2 mod N]] n−2mod( N−1)
⊕[WN,WNg ,WNg
] (5)
Thus, an N-point DFT are converted into an N-1 point Cyclic convolution.
3.WINOGRAD’S SMALL CONVOLUTION ALGORITHM This algorithm performs the convolution with minimum number of multiplications and additions and thus the computational complexity of the process is greatly reduced. Cyclic convolution is also known as circular convolution. Let h={h0, h1, … hn-1 } be the filter coefficients and x={x0,x1,….xn-1} be the data sequence. The cyclic convolution can be expressed as s(p)=hOn x=h(p)x(p)mod(pn-1).
(6)
The cyclic convolution can be computed as a linear convolution reduced by modulo pn-1. Alternatively, the cyclic convolution can be computed using CRT with m(p)=pn-1,which is much simpler. Thus, Winograd's minimummultiply DFT's are useful only for small N. They are very important for Prime-Factor Algorithms, which generally use Winograd modules to implement the short-length DFT’s[10]. The theory and derivation of these algorithms is quite elegant but requires substantial background in number theory and abstract algebra. Fortunately, for the practitioner, the entire short logarithm one is likely to need have already been derived and can simply be looked up without mastering the details of their derivation. 3.1 Algorithm 1. Choose a polynomial m(p) with degree higher than the degree of
h(p)x(p) and
factor it into k+1 relatively prime polynomials with real coefficients, i.e.,
UbiCC Journal - Volume 3
90
(7)
m(p) = m(0)(p)m(1)(p).....m(k)(p). 2. Let M(i)(p) = m(p) / m(i)(p) and use the Chinese
Remainder
Theorem
Project Navigator, we can run the implementation process in one-step, or we can run each of the implementation processes separately.
(CRT)
(i)
algorithm to get N (p). 3. Compute h(i)(p) = h(p) mod m(i)(p), x (i)(p) = x(p) mod m(i)(p),
(8) (9)
for i=0,1,2,...,k. 4. Compute s(i)(p) = h(i)(p)x(i)(p) mod m(i)(p),
(10)
for i =0,1,...,k. 5. Compute s(p) using the equation: s(p)= ∑(i=0 to k) s(i)(p)N(i)(p)M(i)(p) mod m(p).
(11)
The computational complexity in case of WFTA is of the order of N, O(N). It has been found that the number of multipliers required
Figure 1 Simulation result for N=2
by WFTA is always less than 2N, which drastically reduces the hardware needed for implementing a DFT block. 4. REALIZATION OF WFTA IN VERILOG HDL The behavioral simulation and synthesis of WFTA for N=2 and 5 can be viewed in the following descriptions. We focus on the Verilog HDL [6] used for our simulation and synthesis. It is shown in Fig.1, 2, 3, 4,5and 6. 4.1.1 SYNTHESIS RESULTS Synthesis of WFTA is being implemented using XILINX ISE 9.1i tool. After design entry and optional simulation, we run synthesis. During this step, VHDL, Verilog, or mixed language designs become net list files that are accepted as input to the implementation step. Figure 2 Simulation result for N=5 4.1.2 IMPLEMENTATION After synthesis, we run design implementation, which converts the logical design into a physical file format that can be downloaded to the selected target device. From
UbiCC Journal - Volume 3
91
Figure 5 Schematic of DFT N=5
Figure3
Schematic
of
DFT
N=2
Figure 4 Schematic of WDFT N=2
Figure 6 Schematic of WDFT N=5
UbiCC Journal - Volume 3
92
Values of
Multipliers
N
required
Multipliers in
required
DFT
WFTA
2
8
0
3
8
2
5
50
5
in
Table 1 Comparison of DFT and WFTA 4.1.3 COMPARISON OF THE OUTPUT VALUES OF DFT AND WDFT FOR N=2, 3 AND 5 The output of normal DFT and WDFT coefficients are obtained by using MATLAB software, as A0, A1, A3, A4 and A5.It is shown in Fig.7.
only least number of multiplier are required. The future scope of the work is to reduce the minimum multiplier so that, the algorithm occupy less space and consume less power. Thus Winograd’s Fourier transform algorithm for N=2, 3 and 5 have been realized and the WFTA has been proved accurate by comparing the values obtained using WFTA with those of DFT using MATLAB. Behavioral simulation of the WFTA was done using XILINX ISE Simulator and the synthesis was implemented in XILINX ISE tool. A full RTL schematic is obtained with logic gate level modeling. Finally, the circuit was implemented using FPGA Spartan 3 kit. The WFTA for the transform lengths equal to powers of prime numbers, i.e. N=2m, 3m and 5m, can be obtained by iteratively using the same skeleton that we have used. Even though this project drastically reduces the number of multipliers required to compute the DFT, it doesn’t eliminate its use. Hence, a new algorithm can be devised based on theories of WFTA to implement DFT with no multipliers and reduced adders.
5. REFERENCES [1] H. Krishna, B. Krishna, K. Y. Lin, J. D. Sun, Computational Number Theory and Digital Signal Processing (CRC Boca Raton, Florida, pp 471-485,(1994). [2].C. S. Burrus and P.W. Eschenbacher, “An in-place in-order prime factor FFT algorithm”, IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-29, no. 4, pp. 806–817, (1981). [3].A. M. Despain, “Very fast Fourier transform algorithms hardware for implementation,” IEEE Trans. Computres., vol. C-28, no. 5, pp. 333–341, May (1979). [4].Keshab Parhi, “VLSI Digital Signal Processing Systems: Design and implementation”Wiely., pp.237-244 (1999). Figure 7 Comparisons of normal dft and wdft 4.2 CONCLUSION AND FUTURE ENHANCEMENTS Most of the present day cryptography algorithms require complex multiplications and more multipliers are required. With the use of Winograd’s Fourier transform algorithm
UbiCC Journal - Volume 3
[5].M. D. Macleod and N. L. Bragg, “A fast hardware implementation of the Winograd Fourier transform algorithm,” Electron. Lett., vol. 19, pp. 363–365, May (1983). [6].Uwe Meyer-Baese, “Digital signal processing with Field Programmable Gate Arrays” Springer-verlag, pp 273-276, (2007)
93
[7].S. Winograd, “On computing the discrete Fourier transform,” Math. Comp., vol. 32, no. 141, pp. 175–199, Jan. (1978). [8]. J. McClellan and C. Rader, Number Theory in Digital Signal Processing. Englewood Cliffs, NJ: Prentice Hall, pp 79-85 (1979).
University, MIT Chrompet campus, Chennai. His areas of interest include VLSI, Image processing, Signal processing and network Security.
[9].S. Winograd, Arithmetic complexity of computations (Society for Industrial and Applied Mathematics, (1980). [10]. M. Heideman, Multiplicative complexity, convolution, and the DFT Springer Verlag, New York ( 1988) [11]. J. Cooley, Some applications of computational complexity Theory to Digital Signal Processing. 1981 Joint Automatic Contr. Conf. University of Virginia, June 1719 (1981). [12]. R. L. Rivest, A. Shamir, and L. Adleman. A Method for Obtaining Digital Signatures and Public Key Cryptosystems. Communications of the Association for Computing Machinery, 21(2) pp. 120–126, February (1978). [13]. W. Diffie and M. E. Hellman. New Directions in Cryptography. IEEE Transactions on Information Theory, IT-22(6) pp. 644–654, November (1976). [14]. National Institute of Standards and Technology (NIST). FIPS Publication 186: Digital Signature Standard. National Institute for Standards and Technology, Gaithersburg, MD, USA, May (1994).
Sathishkumar.G.A obtained his M.E from PSG college of Technology, Coimbatore, India. He is currently perusing PhD from Anna University, Chennai and Faculty member in the Electronics and Communication Dept of Sri Venakesateswara College of Engineering, Sriperumbudur.His research interest is VLSI Signal processing Algorithms, Image Processing and Network Security. Dr.K.Boopathy Bagan completed his doctoral degree from IIT Madras. He is presently working as professor, ECE dept, in Anna
UbiCC Journal - Volume 3
94
A MULTIAGENT CONCEPTULIZATION FOR SUPPLY-CHAIN MANAGEMENT Vivek kumar , Amit Kumar Goel , Prof. S.Srinivisan Department of computer science & Engineering Gurgaon Institute Technology Management, India
[email protected],
[email protected] ABSTRACT In Global world there is a huge network consisting by different companies for their suppliers, warehouses, distribution centers, retailers, with the help of these entities any organization acquired raw material , transformed , and delivered finished goods. The major challenges for Industrial organization are to reduce product development time, improve quality, and reduce cost of production. This is done only when the relationship among various organization/industrial houses is good, This is not only be done by the change of Industrial process/ method but the latest electronic tools controlled by computer software is required to establish in this competitive world. Software agents consist of one or many responsibility of supply chain, and each agent can interact with others without human intervention in planning and execution of their responsibilities. This paper present solution for the construction, architecture, coordination and designing of agents. This paper integrates bilateral negotiation, Order monitoring system and Production Planning and Scheduling multiagent system. KeyWords: Agent, Supply chain, Multiagent, Multiagent System Architecture for supply chain management 1. INTRODUCTION To improve the efficiency of supply chain management it is mandatory to take intelligent, tactical, strategic and good operational decision at each end of chain. Under strategic decision the agent will take the decision about suppliers, warehouses, production units, transportation system etc. The tactical decision takes place to meet actual demand. The agent on operational level is responsible to execute whole plan. To do all things in smooth way the coordination among agents is must otherwise if the material do not arrive on time the production will stop, if finished good has been ready and warehouses are not empty then it will create a great confusion. The ability to manage all level of supply chain system [1], coordination, accurate and timely dissimilation of information is the enterprise goal. 2
Agent
In Software we can define the agent that it is an entity which consists of various attributes defines a particular domain. Exp: An agent deals with warehousing consist its local attribute as well as the details which will be coordinated with other entity (Agents). So agents emulate the mental process or simulate the rational behavior. A multi-agent system is a loosely coupled network of problem-solver entities that work together to find answers to problems that are beyond the individual capabilities or knowledge of each entity. The first issue is how
UbiCC Journal - Volume 3
the different activities of supply chain can be distributed in agents. A typical example of multiagent system is taken with the help of Coffee maker and toast maker. Let a person wants the toast as the coffee is ready, means the coordination between Coffee maker and toast maker is essential. Otherwise many situation may be raised like Coffee is ready but toast is not prepared and it comes after some time or the toast is ready and the Coffee is not prepared. 1.Agents are problem solvers. 2.Agents are pro-active. 3.Agents are goal-oriented. 4.Agents are context-ware. 5.Agents are autonomous 2.1 Requirement / Logistics agent These agents coordinate all activities of plant and find the various demands of various sections. It holds the data of day to day production, find how much material has been consumed a day depending on the working hours a machine works. Categorized each component in different table and coordinates with other agent like Demand agent etc. The intelligent part of the agent is to find the efficiency of machine, minimizing cost increasing throughput etc. It can also consist feedback of the finished goods and suggest appropriate changes if required.
95
2.2 Demand Agent and generating schedules that are sent to the dispatching agent for execution. It assigns resources new orders and start times to activities that are feasible while at the same time optimizing certain criteria such as minimizing work in progress or tardiness. It can generate a schedule from scratch or repair an existing schedule that has violated some constraints. In anticipation of domain uncertainties like machine breakdowns or material unavailability, the agent may reduce the precision of a schedule by increasing the degrees of freedom in the schedule for the dispatcher to work with. For example, it may “temporally pad” a schedule by increasing an activity’s duration or “resource pad” an operation by either providing a choice of more than one resource or increasing the capacity required so that more is available.
This agent coordinates with other agent like requirement/logistics agent. The main objective of this agent is to fulfill the requirement of various section of the company/customer. The intelligent part of this agent is to acquire orders from various vendors, compare them on the basis of quality, price, availability etc. In case any demand increases or decreases automatically vendor will be communicated. 2.3 Transport agent This agent is responsible for the availability of the transport, dispatching of finished goods at particular destination. It manages all the transportation routes. 2.4 Financial agent This agent is responsible to avail the money for purchasing any material. It coordinates with other agents analyze the cost and ensure that the money has been paid to the party in definite time.
3
3.1 Facilitators Agents to which other agents surrender their autonomy in exchange for the facilitator' s services. Facilitators can coordinate agents'activities and can satisfy requests on behalf of their subordinated agents.
2.5 Scheduling agent This agent is responsible for scheduling and rescheduling activities in the factory, exploring hypothetical “what-if” scenarios for potential new orders
Protégé-2000
MIDDLE AGENTS
Mediator Agent
ATS
Agent System
Type
Behavior
Database
Software Developer
Data Mining Module
XML Application Data
Fig 2: Architecture of Multiagent UbiCC Journal - Volume 3
96
3.2 Mediators Agents that exploit encoded knowledge to create services for a higher level of applications.
•
A common format for the content of communication
3.3 Brokers
•
A shared ontology
Agents that receive requests and perform actions using services from other agents in conjunction with their own resources. 3.4 Helpline/Yellow pages Agents that assist service requesters to find service provider agents based on advertised capabilities. 3.5 Agent Interaction Interaction is one of the important features of an agent [2]. In other words, agent recurrently interaction to share information and to perform task to achieve their goal. Researchers investigating agent’s communication languages mention three key elements to achieve multiagent interaction. [3][4][5].A common agent communication language and protocol
4. AGENT COMMUNICATION LANGUAGE There are two main approaches to design a agent communication language [6], The first approach is procedural and the second one is declarative .In procedural communication is based on executable content but in declarative communication is based on definition assumptions and declarative statement .one of the more popular declarative agent languages (KQML)[8] 5. MULTIAGENT ARCHITECTURE FOR MANAGEMENT
SUPPLY
SYSTEM CHAIN
Our framework provides a GUI application that enables the design of multiagent system with protégé -2000[7], as well as single agents or multiagent communities using common drag and drop operation.
Retailer
Retailer Agent
Data mining looks
Logistics
Warehouse Warehouse
Warehouse
Purchase
Plant
Plant Plant
Operation Resource Management Scheduling
Raw material
Supplier
Fig-1-Architecture of Multiagent Supply Chain Management System UbiCC Journal - Volume 3
97
6. FORMULATION OF BEHAVIOR TYPES Behavior depends on the generic templates and work flow i.e. receiving and sending the message. Execute the stored application and gives necessary deriving decision using inference engine. There are four types of workflow terminals. 6.1 Add-on terminals For the addition of predefined function. 6.2 Execute terminals
embedding specific knowledge into agents. This data mining module receives the information from the XML document and executes the suitable data mining functions designed by the application developer. These models represented in Predictive Modeling Markup Language [8] which is a data mining standard defined by DMG (Data Mining Group) [9] provides the agent platform with versatility and compatibility to other. Major data mining software are Oracle SAS SPSS and MineIT etc
Execute the particular reasoning terminal.
7. CONCLUSIONS
6.3 Agent Types
Information technology based solution frameworks offer a way to more effectively integrate decisionmaking by enabling better knowledge sharing and facilitating more transparent economic transactions. The multi-agent system paradigm promises to be a valuable software engineering abstraction for the development of computer systems. In addition, the wide adoption of the Internet as an open environment and the increasing popularity of machineindependent programming languages, such as Java, make the widespread adoption of multi-agent technology a feasible goal
After the formulation of behavior type we get a new agent type in order to be used later in multiagent system development i.e. Agent Type = Agent + Behavior 6.4 Receiving terminals For the filtration of receiving information. 6.5 Sending terminals For the composition and then send further.
REFERENCES
New agent can be created by existing one which will be a template for creating agent instances during the design of a multiagent system architecture.
1. Zhou, L.; Xu, X.; Huang, T. and Deng, S.: Enterprise Interoperability: New Challenges and Approaches,
6.6 Data base for agent
2. Nwana, H. S.: Software Agents: An Overview, The Knowledge Engineering Review, October/November 1996, Vol. 11, No. 3, PP. 205244.
This unit acts as a storage facility to ensure inters functionality between all system components. In this system the database stores ontologies, behavior, types of agent and the historical data to be mined. This unit can be designed by RMI. 6.7 Agent training system (ATS) This system gathers information from the data mining procedure and then takes the decision and sends this decision into the newly created agent. 6.8 Data Mining System This system holds the implementation of data mining algorithm executed by data mining procedures which gives a new decision model which are again enabled into agent via ATS. And also responsible for
UbiCC Journal - Volume 3
3. Data Mining Group, the: Predictive Model Markup Language Specifications (PMML)[1], ver. 2.0 available at: http://www.dmg.org
4. Bradshaw, J. M.; Dutfield, S.; Benoit, P. and Woolley, J.D. KAoS: Toward An Industrial-Strength Open Agent Architecture, Software Agents, Bradshaw, J.M. (Ed.), Menlo Park, Calif., AAAI Press, 1997, PP. 375-418. 5. Russell, S. J. and Norvik, P.: Artificial Intelligence: A Modern Approach, Prentice Hall, Englewood Cliffs, N.J., 1995.
98
6. Genesereth, M.: An Agent-based Framework for Interoperability, Software Agents, Bradshaw, J. M. (Ed.), Menlo Park, Calif., AAAI Press, 1997, PP. 317-345. 7. Noy, N. F.; Sintek, M.; Decker S.; Crubezy, M.; Fergerson, R. W. & Musen, M. A.: Creating Semantic Web Contents with Protégé-2000, IEEE Intelligent Systems Vol. 16, No. 2, 2001, PP. 60-71. 8. Finin, T., Labrou, Y. and Mayfield, J.: KQML as an Agent Communication Language, Software Agents, Bradshaw, J.M. (Ed.), Menlo Park, Calif., AAAI Press, 1997, PP. 291-316. 9. Huhns, M. N. and Singh, M. P.: Agents and Multiagent Systems: Themes, Approaches, and Challenges, Readings in Agents, Huhns, M. N. and Singh, M. P. (Eds.), San Francisco, Calif., Morgan Kaufmann Publishers, 1998, PP. 1-23.
UbiCC Journal - Volume 3
99
SCALABLE ENERGY EFFICIENT AD-HOC ON DEMAND DISTANCE VECTOR (SEE-AODV) ROUTING PROTOCOL IN WIRELESS MESH NETWORKS Sikander Singh Research Scholar, Department of Computer Science & Engineering, Punjab Engineering College (PEC), Deemed University, Sector-12, Chandigarh-160012 (India)
[email protected], Dr. Trilok Chand Aseri Sr. Lecturer, Department of Computer Science & Engineering, Punjab Engineering College (PEC), Deemed University, Sector-12, Chandigarh-160012 (India)
[email protected]
ABSTRACT A new routing protocol called Scalable Energy Efficient Ad-hoc on Demand Distance Vector (SEE-AODV) having good scalable properties and energy efficient than existing Ad hoc on Demand Distance (AODV) routing protocol in wireless mesh networks has been proposed in this paper. Wireless mesh networks (WMNs) consist of mesh routers and mesh clients, where mesh routers have minimal mobility and form the backbone of WMNs. They provide network access for both mesh and conventional clients. Two techniques called Clustering and Blocking-Expanding Ring Search has been applied on existing AODV routing protocol to improve its scalability and energy efficiency problem. Results shows that, performance of SEEAODV routing protocol is better than existing AODV routing protocol in wireless mesh networks. To show the efficiency of our proposed routing protocol, simulations have been done by using Network Simulator-2 (NS-2). Keywords: Wireless mesh networks, Ad-hoc network, Routing, Distance vector. 1
INTRODUCTION
Wireless mesh network (WMN) [1] technologies have been actively researched and developed as key solutions to improve the performance and services of wireless personal area networks (WPANs), wireless local area networks (WLANs) and wireless metropolitan area networks (WMANs) for a variety of applications, such as voice, data and video. Compared with mobile ad hoc networks (MANETs), wireless sensor networks (WSNs) and infrastructurebased mobile cellular networks, WMNs are (i) quasistatic in network topology and architecture (ii) not resource constrained at mesh routers and (iii) easy and flexible to deploy. These technological advantages are especially appealing to the emerging market requirements on future wireless networks and services, such as flexible network architecture, easy deployment and self-configuration, low installation and maintenance costs and interoperable with the existing WPAN, WLAN and WMAN networks. Potential applications of WMNs include broadband home networking, community and neighborhood networking, enterprise networking, building automation and so on. These wide ranges of
UbiCC Journal - Volume 3
applications have different technical requirements and challenges in the design and deployment of mesh networking architectures, algorithms and protocols. The objective of this work is to develop routing protocols for Wireless Mesh Networks and to analyze their performance by realizing different environments. The analysis has been done theoretically and through simulations using NS-2 (Network Simulator-2). Objectives of the work are:1. To simulate the proposed routing protocol, Scalable Energy Efficient-Ad-hoc on-demand Distance Vector (SEE-AODV) for wireless mesh networks. 2. Evaluation of routing protocols based on various parameters. 3. Comparison of proposed protocol with existing protocol. 2
RELATED WORK
Wireless mesh networks has recently gained a lot of popularity due to their rapid deployment and instant communication capabilities. These networks comprise of somewhat static multi-radio Mesh Routers [2], which essentially provide connectivity
100
between the mobile single-radio Mesh Clients. Special routing protocols are employed which facilitate routing between the Mesh Routers as well as between the Mesh Routers and the Mobile Clients. AODV is a well known routing protocol that can discover routes on-the-fly in a mobile environment. However, as the protocol was actually developed for single-radio nodes, it frequently lacks the ability to exploit the potential offered by the Mesh Routers. There are hundreds of proposed routing protocols [3], many of them have been standardized by IETF and have been in use for many years. Some of those protocols have proven themselves in the Internet and are expected to continue to perform well for many years to come. In the ad-hoc networking arena, several classes of routing protocols have been proposed and carefully analyzed. The WMN companies [4] are using a variety of routing protocols to satisfy their needs. Furthermore proposed routing protocol takes advantage of the fixed nodes in WMNs. In this paper some enhancements are done to improve the existing AODV protocol so that it works well in wireless mesh networks having good scalability and energy efficient. 3
SCALABLE ENERGY EFFICIENT ADHOC ON DEMAND DISTANCE VECTOR ROUTING PROTOCOL (SEE-AODV)
In this paper to develop SEE-AODV Routing Protocol two techniques called Clustering and Blocking Expanding Ring Search have been applied to improve the performance of existing AODV routing protocol in wireless mesh networks. The performance of wireless mesh networks is highly dependent on routing protocol. AODV is a popular routing protocol for wireless networks. AODV is well suited for wireless mesh networks in that it has low processing & memory overhead and low network utilization. Additionally AODV provides loop freedom for all routes through the use of sequence numbers. 3.1
Design Goals The design goal of SEE-AODV routing protocol is to improve the scaling potential of AODV and to make it energy efficient. The main feature of AODVClustering includes:(i) Gradualness The protocol first works on AODV method, then gradually changes to a clustering route protocol. There are several considerations about it: First, there is central control node in mesh network; second, using this method can also allow AODV nodes coexisting with AODV-Clustering nodes; third, it can reduce the overhead caused by frequently changing of cluster. (ii) Coexist with AODV
UbiCC Journal - Volume 3
AODV is a widely accepted routing protocol. There are several implemented version of AODV reported. One of the important principles in designing AODV-Clustering is the coexistence with AODV; it means nodes which implement AODV protocol can work collaboratively with nodes that implement proposed protocol in the same network. To achieve this, keep all AODV route control packets and add some new control packets as needed. In fact AODV-Clustering let all nodes that haven’t joined a cluster works on AODV method. (iii) Route Discovery Mechanism In AODV-Clustering, there are two route discovery mechanisms for the nodes which join the cluster. One is Blocking-ERS, it can increase the efficiency of the route discovery; the other is traditional RREQ flooding route discovery mechanism which extends from AODV. Normally Blocking-ERS is used first, if a suitable route can’t be found before timeout, then traditional route discovery mechanism is used. (iv) Local Route Repair To reduce the number of lost data packets, when the route broke due to mobility, AODV-Clustering let the node upstream of the break link perform a local repair instead of issuing the RERR. If the local repair is successful, a RREP will be returned either by the destination or by a node with a valid route to the destination. In the event that a RREP is not returned, the local repair is not successful and a RERR message is sent to the source node. 3.2 AODV-Clustering Routing Scheme (i) Route Discovery At the beginning the status of all nodes is “unassigned”. Source node broadcast [5] RREQ message to find a route, destination node or intermediate node that has fresh route to destination reply with RREP message to source. On the way which RREP pass by, one or several nodes are selected as cluster head (CH) using a defined rule. CH node will broadcast CH message to its neighbours, the neighbour nodes which receive this broadcast will act differently according to its roles. (a) For node whose status is “unassigned”, it will issue a Join Cluster Request to the broadcasting CH and can become an ordinary cluster member after receiving the acknowledgment from this CH. In AODV-Clustering, CH node can reach all his members by one hop, so the protocol’s architecture is one level hierarchy. (b) For node which is an ordinary cluster member, it will judge if the broadcast message sender is its original CH, if yes, no action needed; else, send Join Cluster Request to the CH and become gateway node after receiving acknowledgement from it. Being a gateway, it will send Gateway Message to all CH nodes which it connect directly, let them put its address in their Gateway Table.
101
(c) For node which is a gateway, it will check its Gateway Node Table whether there is an entry for this broadcasting CH node, if yes, no action needed; else, send Join Cluster Request to it and put its address into Gateway Node Table after receiving acknowledgement from it. Gateway nodes have two tables, one is Gateway Node Table which contains cluster head address which it connect directly, another is Joint Gateway Table, which contains address of the CH nodes which can be reached by 2 hops and also the nodes which help to reach these CHs which are called Joint Gateway. (ii) Route Maintenance AODV-Clustering extends many features from AODV for route maintenance, for example, use “hello” message to confirm the existence of neighbours, use RERR message to inform the nodes affected by the link breakage. In addition to it, AODV-Clustering adds some cluster related maintenance operations such as joining cluster, leaving cluster and changing its status. (a) Joining Cluster The nodes whose status is “unassigned” will join the cluster after receiving the CH Broadcasting Message. To reduce overhead, no periodic CH Broadcasting is used. Instead, a “on demand” method is used for new nodes to join cluster: when the node whose status is “unassigned” broadcast RREQ to its neighbours, a specific mark is set in RREQ, the mark will inform neighbouring CHs to let him join the cluster. Using this way, unassigned node that newly comes will have a chance to join the nearby cluster when it has data to send. (b) Leaving Cluster If the ordinary cluster member finds its CH node unreachable, it will change its status to unassigned. This means the node leaving the cluster. CH node should change its status to unassigned when it has no cluster members. (c) Changing Cluster In AODV-Clustering, the change of cluster is not occurred immediately. When the node leave the cluster, its status become “unassigned”, it works on AODV method until new CH node appear nearby and it has a chance to join the cluster. Route Discovery Approaches in Wireless Mesh Networks In Wireless mesh networks, nodes cooperating for delivery of a successful packet form a communication channel consisting of a source, a destination and possibly a number of intermediate nodes with fixed base station. In this paper some inefficient elements have been found in well known reactive protocols AODV and propose a new approach for rebroadcasting in Expanding Ring Search. This leads to the Blocking-ERS scheme, as we call it, which demonstrates improvement in energy efficiency at the expense of route discovery
time in comparison to conventional route search method. 3.3.1 Blocking-ERS An alternative ERS scheme has been proposed to support reactive protocols such as DSR (Distance Source Routing) and AODV and it is called Blocking Expanding Ring Search. The Blocking-ERS integrates, instead of TTL sequences, a newly adopted control packet, stop instruction and a hop number (H) to reduce the energy consumption during route discovery stage. The basic route discovery structure of Blocking-ERS is similar to that of conventional TTL sequence-based ERS. One of the differences from TTL sequence-based ERS is that the Blocking-ERS does not resume its route search procedure from the source node every time when a rebroadcast is required. The rebroadcast can be initialized by any appropriate intermediate nodes. An intermediate node that performs a rebroadcast on behalf of the source node acts as a relay or an agent node. Fig. 1 shows an example of Blocking ERS approach in which the rebroadcasts are initialized by and begins from a relay node M in rebroadcast round 2 and another relay node N in round 3 and so on. In Fig. 1 the source node broadcasts a RREQ including a hop number (H) with an initial value of 1. Suppose that a neighbour M receives the RREQ with H=1 and the first ring was made. If no route node is found, that is, no node has the requested route information to the destination node, the nodes in the first ring rebroadcast the RREQ with an increased hop number, for example RREQ with H=2 is rebroadcast in this case The ring is expanded once again just like the normal expanding ring search in AODV except with an extended waiting time. The waiting time can be defined as Waiting Time = 2 × Hop Number The nodes in Blocking-ERS receiving RREQs need to wait for a period of 2H, i.e. 2×their hopnumber unit time before they decide to rebroadcast, where the ‘unit time’ is the amount of time taken for a packet to be delivered from one node to one-hop neighbouring node.
3.3
UbiCC Journal - Volume 3
Figure 1: Blocking-ERS (A) Energy Consumption
102
Energy consumption during the transmission of RREQs can be saved by using the Blocking-ERS scheme. Let the amount of energy consumption on each node for one broadcast is the same unit energy consumption, denoted by UnitEnergy. Assume that each action of broadcasting a RREP, RREQ or ‘stop instruction’ consumes the same amount of 1 UnitEnergy. This can be easily shown by the difference of the energy consumption between the conventional TTL sequence-based ERS and the Blocking-ERS scheme. (i) One route case: - First consider only the energy consumption along the route from the source to the route node. The energy consumption for the TTL based ERS and for the Blocking-ERS can be described by the following formula (Eq. (1) and Eq. (2)) respectively, where Hr is the hop number of the route node. Hr
ETTL − ERS = H r + ∑ i (UnitEnergy)
(1)
E Blocking − ERS = 3H r (UnitEnergy)
(2)
i =1
The difference of the amount of energy consumption is more visible from Fig. 2 where the amounts of energy consumption by the two ERS approaches are plotted against the number of rings. The Blocking-ERS curve is below the TTL based ERS curve after ring 3. As it is clear from Fig. 2, the difference of the amount of energy consumption between these two mechanisms becomes larger as the distance increases between the source and the route node.
sending the ‘stop instruction’. For the conventional TTL based ERS, the energy consumption during the route discovery process includes that in two stages: (a) searching for the route node and (b) return RREP. The energy consumed for “(b) returning a RREP” is Hr UnitEnergy for both routing schemes and Hr UnitEnergy is consumed for the Blocking-ERS stage ‘(c) sending the stop instruction’. In the stage of ‘(a) searching for the route node’, the energy consumption is different for the two methods. Each ring contains different number of nodes that rebroadcast to form the next ring. Let ni be the number of nodes in ring i and the hop number of the route node be Hr. In the Blocking-ERS, the energy consumed in each ring is as below: Ring i Energy Consumed 0 1 1 n1 Hr-1 nhr-1 In the TTL based ERS, the energy consumed in each ring is as follows: Ring i Energy Consumed 0 1 1 1 + n1 Hr-1 1 + n1 + n2 + · · · + nhr-1 Therefore, the total energy consumption by the Blocking- ERS is given by Eq. (3) Hr
E Blocking − ERS = 2(1 + ∑ ni ) + E RREP (UnitEnergy )
(3)
i =1
Similarly, the total energy consumption by the conventional TTL sequence based ERS is given by Eq. (4) Hr
i
E TTL − ERS = H r + ∑ ∑ n j + E RREP (UnitEnergy )
(4)
i −1 j =1
The difference between the
E Blocking − ERS and
ETTl − ERS is given by Eq. (5) H r −1
i
i =1
j =1
E Saved = H r − 2 + ∑ ((∑ n j ) − 2 ni )(UnitEnergy)
Figure 2: Comparison of energy consumption for one route (Energy is measured in Joule, and Distance in number of hops) (ii) General case: - Now consider the general case. For the Blocking-ERS, the energy consumption during the route discovery process can be considered as the total energy consumption in three stages: (a) searching for the route node (b) return RREP and (c)
UbiCC Journal - Volume 3
(5)
Clearly, when ni = 1, for i = 1, · · · , Hr . The above formulas represent the energy consumption for a single route. This indicates that the energy consumption saving achieved by the Blocking-ERS for a single route is the minimum amount of energy saving. (B) Time Delay Consider the time delay for the route discovery period, during which from the RREQ is broadcasted for the first time, transmitted from the source node to the route node possibly via flooding. That is the total
103
time taken from when the source node broadcasts the first RREQ for the first time until after a Route Node is found and the source node receive the RREP from the Route Node. Let the UnitTime be the one-hop transmission time, which is the time taken for a RREQ from a broadcasting node to one of its neighbour nodes. In case of TTL sequence based ERS suppose H = 3, that is the route node is 3 hops distant from the source node. The total time includes the time when TTL = 1, 2 and 3. The final TTL number equals to the hop number of the route node. This gives the following formula of total time delay for the TTL sequence-based ERS (Eq. 6): Hr
TTTL − ERS = 2∑ i (UnitTime)
(6)
i =1
Now consider the time delay in the BlockingERS. The total time includes the time for three stages: (a) searching for the route node, (b) returning the RREP and (c) broadcasting the ‘stop instruction’. For stage (a), the time consists of the time to for broadcasting and the waiting time. The broadcasting time for 1 hop distance is 1 UnitTime. The waiting time depends on the hop number of the node. The route node is 3 hops distant from the source node. Each node needs to wait for 2H before rebroadcasting. At ring 1, the node waits for 2 × 1 = 2 UnitTime, and at ring 2, the node waits for 2 × 2 = 4 UnitTime, and at ring 3, the node waits for 2 × 3 = 6, so the total waiting time for the ‘(a) stage of searching for the route node’ is 2 + 4 + 6 = 12, and the total time for stage (a) is 12+Hr = 12+3 = 15. The time for stage (b) and (c) are Hr and this gives 2Hr = 2×3 = 6. Therefore, the total time for the route discovery and flooding control is 15 + 6 = 21 UnitTime. Mathematical formula is presented below, where H r represents the hop number of a route node.
Hr
TBlocking − ERS = 3H r + 2∑ i (UnitTime )
(7)
i =1
Compare this to the TTL sequence based ERS as given in Eq. (8): Hr
TTTL − ERS = 2∑ i (UnitTime )
(8)
i =1
It is clear that the difference between the two is 3Hr three times of the hop serial number of the route node, depending only on the distance between the source node to the Route node. The time delay in both approaches is compared in Fig. 3 As illustrated the Blocking-ERS takes slightly more time than the conventional TTL sequence-based ERS for the route discovery process. 4
PERFORMANCE EVALUATION
To evaluate the performance of SEE-AODV routing protocol (AODV-Clustering with BlockingERS technique), the simulation is done by using NS2 [6]. The performance of SEE-AODV routing protocol is evaluated by comparing it with existing AODV routing protocol in the same conditions. There is evaluation of three performance metrics: (i) Packet Delivery Fraction: - This is the fraction of the data packets generated by the sources that are delivered successfully to the destination. This evaluates the ability of the protocol to discover routes. (ii) Routing Load: - This is a ratio of control packet overhead to data packet overhead, measured by the number of route control packets sent per data packet. The transmission at each hop along the route was counted as one transmission in the calculation. (iii) Average Route Acquisition Latency: - This is the average delay between the sending of a route request packet by a source for discovering a route to a destination and the receipt of the first corresponding route reply. If there is a fresh route already, 0 was used for calculating the latency.
Figure 3: Comparison of the time delay (Time is measured in millisec, and Distance in number of hops) The formula of the time delay in the BlockingERS is given by Eq. (7): -
UbiCC Journal - Volume 3
Figure 4: AODV, SEE-AODV packet delivery
104
fraction As shown in Fig. 4 the packet delivery fraction obtained using SEE-AODV is almost identical to that obtained using AODV when node numbers are small. However, when there are larger numbers of nodes (i.e., more than 200), SEE-AODV performed better. This suggests that SEE-AODV is highly effective in discovering and maintaining routes for delivery of data packets.
Figure 5: AODV, SEE-AODV Routing Load Fig. 5 shows the routing load comparison of these two protocols. Routing load is measured by numbers of route control packets sent per data packets. For AODV, route control packets include RREQ, RREP, RERR and Hello messages, for SEEAODV cluster related messages such as CH Broadcast Message, Join Cluster Request, Join Cluster ACK, and Gateway message are also included. It is clear from Fig. 5, routing load of SEEAODV was significantly lower than AODV when there are large numbers of nodes in network. So SEE-AODV became a hierarchical protocol gradually, hierarchical routing greatly increases the scalability of routing in wireless networks by increasing the robustness of routes.
performed better than AODV in more “stressful” situations (i.e. larger number of nodes, more load), this is greatly contributed to the Blocking-ERS technique of AODV-Clustering and reduction of the RREQ flooding. 5
CONCLUSION AND FUTURE WORK
In this paper Scalable Energy Efficient AODV (SEE-AODV) routing protocol has been introduced to solve the scalability problem of AODV by applying Clustering and to make it energy efficient by using Blocking-ERS techniques in wireless mesh networks. The performance is studied by simulations based on NS-2. The result shows, that SEE-AODV protocol achieves better scalability than existing AODV while keeping the merits of it. The analysis demonstrates a substantial improvement in energy consumption that can be achieved by the BlockingERS at a margin cost of slightly longer time. There are some limitations still in SEE-AODV routing protocol, such as during discovery of a route the technique used in this protocol slightly takes more time than existing conventional one, so it need further study to improve this drawback. 6
REFERENCES
[1] K. N. Ramachandran: On the Design and Implementation of Infrastructure Mesh Networks, IEEE Wksp. Wireless Mesh Networks, Calcutta, pp. 78-95 (Sept 2005). [2] R. Draves, J. Padhye, and B. Zill: Routing in Multi-Radio, Multi-Hop Wireless Mesh Networks, Mobile Communication, pp. 114–28 (Sept 2004). [3] R. Draves, J. Padhye, and B. Zill: Comparison of routing metrics for static multi-hop wireless networks, Proc. of SIGCOMM’04, (Portland, OR), pp. 12-24 (Aug 2004). [4] I. F. Akyildiz and X. Wang: A Survey on Wireless Mesh Networks, IEEE Commun. Mag., vol. 43, no. 9, pp. S23–S30 (Sept 2005). [5] V. Li, H. S. Park, and H. Oh: A Cluster-LabelBased Mechanism for Backbone on Mobile Ad Hoc Networks, The 4th Wired/Wireless Internet Communications (WWIC 2006), pp.26-36 (May 2006). [6] http://www.isi.edu/nsnam/ns/nsdocumentation.
Figure 6: AODV, SEE-AODV Average Route Acquisition Latency Fig. 6 shows the Average Route Acquisition Latency of the two protocols. SEE- AODV
UbiCC Journal - Volume 3
105