Analysis Of Costs Of Recovery Of Message Logging

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Analysis Of Costs Of Recovery Of Message Logging as PDF for free.

More details

  • Words: 2,659
  • Pages: 28
International Institute of Professional Studies

Research Paper Submission

on

Analysis of Costs of Recovery of Message Logging Protocols

Date: 15th Oct,’01 Submitted to: Ms. Yamini Karmarkar Submitted by: Megha Jain IM-98-29 M.M.S. (5Yrs.) VII Semester

ACKNOWLEDGEMENTS

I am grateful to Ms. Yamini Karmarkar, who gave me an opportunity to present this research paper; and under whose able guidance and experience, this research paper was possible. I am indebted to the Library resources and Librarians of International Institute of Professional Studies who provided me with the relevant journals and literatures used in this research. My thankfulness is also extended to my peers and colleagues for their support and encouragement. Megha Jain IM-98-29

Abstract – Past research in message logging has focussed on studying the relative overhead imposed by pessimistic, optimistic and causal protocols during failure free executions. Applications face a complex trade-off when choosing a message logging protocol for fault tolerance. In this empirical research, an evaluation of the performance of these protocols during recovery is presented.

1. AIM To analyze the effect of T[chk], T[acq], T[rollfwd], T[replay] and T[rollbck] on T[rec]; where: T[chk] – Time required to restore the state of the failed process from its latest checkpoint. T[acq] – Time required to retrieve determinants and messages logged during failure-free execution.

T[rollfwd] – Time required to roll-forward the execution of process to its precrashed state. T[replay] – Time required to replay messages to the recovering process from the acquired logs. T[rollbck] – Time required to roll-back orphan processes. T[rec] – Recovery time. 2. INTRODUCTION Message-logging protocols are popular techniques for building systems that can tolerate process crash failures. These protocols are built on the assumption that the state of a process is determined by its initial state and by the sequence of messages it delivers. In principle, a crashed process can be recovered by: • Restoring the process to its initial state. • Rolling it forward by replaying to it messages in the same order in which they were delivered before the crash.

In practice, message-logging protocols limit the extent of roll-forward by having each process periodically save its local state in a checkpoint. The delivery order of each message is recorded in a tuple, called the message’s determinant. If the determinants of all the messages delivered by a crashed process are available during recovery, then the process can be restored to a state consistent with the state of all operational processes. An operational process whose state is inconsistent with the recovered state of a crashed process is called an orphan process. All message-logging protocols guarantee that upon recovery, no process is an orphan, but differ in the way they enforce this consistency condition. On this basis they can be classified into: • Pessimistic protocols: They require that a process, before sending a message, synchronously log on stable storage the determinants and the content of all messages delivered so far. Thus, these protocols never create orphan processes. There are two pessimistic protocols:

1) Receiver-based Pessimistic: A process, before sending a message, logs to stable storage both the determinants and the contents of the messages delivered so far. 2) Sender-based Pessimistic: The receiver logs synchronously to stable storage only the determinant of every message it delivers, while the contents of the message are stored in a volatile send log kept by the message’s sender. • Optimistic protocols: They allow processes to communicate even if the determinants on which they depend are not yet logged on stable storage. They only require them to eventually reach stable storage. However, if any of the determinants are lost when a process crashes, orphans may be created. To reach a consistent global state, these processes must be identified and rolled back. • Causal protocols: They combine some of the positive aspects of pessimistic and optimistic protocols. They never create orphans, yet they do not write determinants to stable storage synchronously. Rather, the determinants are logged on to volatile memory. Here, if the state of an

operational process p causally depends on the delivery of a message m, then, p has a copy of m’s determinant in its volatile memory. This property is sufficient to restore a crashed process in a state consistent with the state of all operational processes. Applications face a complex trade-off when choosing a message logging protocol for fault tolerance. On the one hand, optimistic protocols can provide fast failure-free execution and good performance during recovery, but are complex to implement and can create orphan processes. On the other hand, orphan-free protocols, like pessimistic and causal, either risk being slow during recovery, or incur a substantial overhead during failure free execution. This trade-off needs to be addressed. 3. LITERATURE REVIEW • L. Alvisi and K. Marzullo, “Tradeoffs in Implementing Optimal Message Logging Protocols”, Principles of Distributed Computing, pp. 58-

67, June 1996. • L. Alvisi and K. Marzullo, “Message Logging: Pessimistic, Optimistic, Causal and Optimal”, IEEE Trans. Software Eng., vol. 24, no.2, pp. 149159, Feb. 1998. • O.P. Damani and V.K. Garg, “How to Recover Efficiently and Asynchronously when Optimism Fails”, Distributed Computing Systems, pp. 108-115, 1996. • S. Rao, L. Alvisi and H.M. Vin, “Egida: An Extensible Toolkit for LowOverhead Fault-Tolerance”, IEEE Fault-Tolerant Computing Symp. FTCS29, Madison, Wis., pp. 48-55, June 1999. • S. Rao, L. Alvisi and H.M. Vin, “Cost of Recovery in Message Logging Protocols”, IEEE Trans. on Knowledge and Data Eng., vol. 12, no. 2, March 2000. • F.B. Schneider, “Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial”, Computing Surveys, vol. 22, no.3, pp. 299319, Sep. 1990. • E.N. Elnozahy, “On the Relevance of Communication Costs of Rollback Recovery Protocols”, Principles of Distributed Computing, pp. 74-79, Aug. 1995.

4. HYPOTHESIS On the basis of the literature review and from the understanding of the concepts, I have formulated the following hypothesis: “There is a high correlation between the T[rec] and the T[rollfwd] in case of receiver-based pessimistic, sender-based pessimistic and causal protocols; and between T[rec] and T[replay] in case of optimistic protocols.” And also, “The best message logging protocol is one with the lowest T[rec].”

4.1 The Null Hypothesis Ho: “There is no significant dependence of T[rec] on T[rollfwd] and T[replay].” 4.2 The Alternate Hypothesis “There is a significant dependence of T[rec] on T[rollfwd] and T[replay].” 5. METHODOLOGY The cost of recovery in message logging protocols is measured using Egida, an object-oriented toolkit for synthesizing rollback recovery protocols. The five long-running compute-intensive applications – bt, cg, lu, sp, and mg – from the NPB2.3 benchmark suite developed by NASA’s Numerical

Aerodynamic Simulation program. This information is obtained from the research conducted by S. Rao, L. Alvisi and H.M. Vin. 5.1 Assumption It is assumed that processes can fail by crashing, but the hardware is reliable. Also, other overheads on a system are not considered. 5.2 The Variables For pessimistic and causal protocols, the recovery time (denoted by T[rec]) for a process consists of: 1) T[chk], the time to restore the state of the failed process from its latest checkpoint. 2) T[acq], the time to retrieve determinants and messages logged during failure-free execution. 3) T[rollfwd], the time to roll-forward the execution of the process to its

precrashed state. For optimistic protocols, on the other hand, in addition to T[chk] and T[acq], the recovery time T[rec] consists of: 1) T[replay], the time to replay messages to the recovering process from the acquired logs. 2) T[rollbck], the time required to rollback orphans. Note: T[acq] is protocol dependent – for pessimistic and optimistic protocols, it is the time to read logs from the file server, while, for causal protocols, it is the time to collect messages and determinants from the logs of the remaining processes. In the case of multiple failures, the values of T[chk], T[acq], T[rollfwd], T[replay] and T[rollbck] are taken to be the processes that take the longest to recover. 5.3 The Method For all protocols, the value of T[rec] is determined by the three parameters: -

1) The number of concurrent failures, f. For optimistic protocols, multiple failures may cause a process to rollback multiple times. For sender-based pessimistic and causal protocols, multiple failures may complicate the task of retrieving messages and determinants from other processes. 2) The number of processes, n. For causal and sender-based optimistic protocols, n may affect T[acq] because it may change the set of processes from which a recovering process collects its logs. For optimistic protocols, n may affect T[rollbck] because it may change the number of orphans. 3) The time, t since the latest checkpoint occurrence at which a failure is induced. For all protocols, this parameter affects the amount of lost computation that has to be recovered and the size of the logs that have to be acquired by the recovering process. Note: To compare fairly the recovery performance of the four logging protocols for a given application, t should be the same for all protocols. Once a t is chosen, the iterative nature of applications to compute the

number of iterations i necessary to run each application is used. 5.4 Communication Patterns of the Benchmark Applications

5.5 The Model for Empirical Analysis 1) For receiver-based pessimistic protocols: T[rec] = T[chk] + T[acq] + T[rollfwd] 2) For sender-based pessimistic protocols:

T[rec] = T[chk] + T[acq] + T[rollfwd] 3) For optimistic protocols: T[rec] = T[chk] + T[acq] + T[replay] + T[rollbck] 4) For causal protocols: T[rec] = T[chk] + T[acq] + T[rollfwd] Note: To measure the values of these protocols, the experiments were repeated for each protocol until the variance for T[chk], T[acq], T[rollfwd], T[replay], T[rollbck] and T[rec] was within 1% of their average value. These figures are taken from the work of S. Rao et all. 6. DATA COLLECTION The data was taken from IEEE Transactions on Knowledge and Data Engineering, Volume 2, No. 2, March/April 2000; from the research of Sriram Rao, Lorenzo Alvisi and Harrick M. Vin.

7. FINDINGS The findings from different statistical analysis on the data set: 7.1 Correlation coefficients See annexes. 7.2 Regression Analysis and Standard Error See annexes. 7.3 Measures of Central Tendency The mean is used as the measure of central tendency. The means of different data sets: -

1) 2) 3) 4)

Receiver-based Pessimistic: Mean T[rec] = 168.39 sec. Sender-based Pessimistic: Mean T[rec] = 187.50 sec. Optimistic: Mean T[rec] = 176.60 sec. Causal: Mean T[rec] = 197.90 sec. 7.4 Standard Deviation

The values for standard deviation (S.D.) for the data: 1) Receiver-based Pessimistic: S.D. = 10.55 sec. 2) Sender-based Pessimistic: S.D. = 13.51 sec. 3) Optimistic: S.D. = 9.90 sec. 4) Causal: S.D. = 20.37 sec. 7.5 Confidence Interval The confidence interval is calculated at 5% significance level for: 1) Receiver-based Pessimistic: 5.34 2) Sender-based Pessimistic: 6.84

3) Optimistic: 5.01 4) Causal: 10.31 7.6 T-test Statistics Two-tailed t-test is used. 1) Receiver-based Pessimistic: Table value of t = 2.179 (at 5% level of significance and 12 degree of freedom). Calculated value of t = 0.10755 2) Sender-based Pessimistic: Table value of t = 2.179 (at 5% level of significance and 12 degree of freedom). Calculated value of t = 0.02822 3) Optimistic: Table value of t = 2.201 (at 5% level of significance and 11 degree of freedom). Calculated value of t = 0.0000038

4) Causal: Table value of t = 2.179 (at 5% level of significance and 12 degree of freedom). Calculated value of t = 0.00267

7.7 R- square Statistics The R-square statistics obtained for the data set: 1) Receiver-based Pessimistic: 0.907557 2) Sender-based Pessimistic: 0.69997 3) Optimistic: 0.62042 4) Causal: 0.55734 7.8 F-test Statistics

F-test applied on the data set: 1) Receiver-based Pessimistic: 0.62777 2) Sender-based Pessimistic: 0.91958 3) Optimistic: 0.48144 4) Causal: 0.20094 8. RESULTS 8.1 Correlation Coefficients The correlation coefficients lie between 0.3 – 1, which implies the correlation ranges from high to near perfect. This suggests that there is near perfect correlation between T[rec] and T[rollfwd]; and high correlation between T[rec] and T[replay] in case of all the message logging protocols.

8.2 Regression Analysis and Standard Error Using the regression analysis, we can find the value of the dependent variable, T[rec], given the values of the independent variables, T[chk], T[acq], T[rollfwd], T[replay] and T[rollbck]. For this calculation, the values of standard errors must be taken into account. The linear regression model fits this calculation.

8.3 Measures of Central Tendency (Mean) Using this feature of the descriptive statistical tools, the average value of the cost of recovery (in terms of time in seconds), T[rec], was found out to be minimum for Receiver-based message logging protocols (168.39 sec). This suggests that (not considering other overheads) the best method is Receiverbased message logging protocols.

8.4 Confidence Interval The mean, standard deviation and the level of significance (here 5%) were used to calculate the confidence interval. In our samples of data, we are 95% confident that the mean is in the confidence interval i.e.: 1) Receiver-based Pessimistic: 168.39 + 5.34 sec 2) Sender-based Pessimistic: 187.5 + 6.84 sec 3) Optimistic: 176.6 + 5.01 sec 4) Causal: 197.9 + 10.31 sec It gives the upper and the lower limit within which the value can lie. 8.5 Significance of t-test The t-test shows whether or not the hypothesis holds. If the calculated tvalue < table t-value, the hypothesis is accepted. Else, it is rejected and the alternate hypothesis is accepted.

1) Receiver-based Pessimistic: 0.10755 < 2.179, the hypothesis “there is a significant dependence of T[rec] on T[rollfwd]”; is accepted. 2) Sender-based Pessimistic: 0.02822 < 2.179, the hypothesis “there is a significant dependence of T[rec] on T[rollfwd]”; is accepted. 3) Optimistic: 0.0000038 < 2.201, the hypothesis “there is a significant dependence of T[rec] on T[replay]”; is accepted. 4) Causal: 0.00267 < 2.179, the hypothesis “there is a significant dependence of T[rec] on T[rollfwd]”; is accepted. 8.6 Significance of R-square test It measures the total percentage variation in dependent variable, T[rec], due to variations in independent variables, T[rollfwd] and T[replay]. Here: 1) Receiver-based Pessimistic: 0.907557 implies 90.76% 2) Sender-based Pessimistic: 0.69997 implies 70%

3) Optimistic: 0.62042 implies 62.04% 4) Causal: 0.55734 implies 55.73% The closer the value of R-square is to 100%, the more is the probability that we have included the most important variables in our analysis. 8.7 Significance of F-statistics It tests the significance of R-square test and the overall validity of the model. If calculated F-value <= table F-value, the null hypothesis is accepted. Else, it is rejected. For the model to be acceptable, the null hypothesis should be rejected. 1) Receiver-based Pessimistic: 0.62777 < 4.18; the null hypothesis is rejected. 2) Sender-based Pessimistic: 0.91958 < 4.18; the null hypothesis is rejected. 3) Optimistic: 0.48144 < 4.10; the null hypothesis is rejected.

4) Causal: 0.20094 < 4.18; the null hypothesis is rejected. Thus, the null hypothesis is rejected and the overall model is held significant. 9. CONCLUSION There is a high correlation between the cost of recovery of message logging protocols, T[rec], and the time required to roll-forward, T[rollfwd], and the time required to replay the message, T[replay]. Also, the cost of recovery in Receiver-based Pessimistic Protocols is the minimum. 10. AREAS OF FURTHER RESEACH Further research can be done on the implementation of a relatively new concept of Hybrid Protocols suggested by S. Rao et all.

The objective of hybrid protocols is to maintain the failure-free performance of sender-based protocols while approaching the performance of receiverbased protocols during recovery. In hybrid protocols, messages are logged both at the sender and at the receiver. The sender synchronously logs messages in its volatile memory; the receiver asynchronously logs messages to stable storage. Since logging on stable storage is asynchronous, performance during failure-free executions is virtually identical to that of sender-based protocols. However, recovery is much faster. Any missing message is either available in the volatile memory of other operational processes or it has to be regenerated during recovery if the sender has failed. In either case, no correct process ever becomes an orphan and the recovering process can roll forward using the messages on stable storage while in parallel acquires the missing messages.

Related Documents