Errors

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Errors as PDF for free.

More details

  • Words: 15,743
  • Pages: 106
STEP 3 ERRORS 3 Error propagation and generation It has been noted that a number is to be represented by a finite number of digits, and hence often by an approximation. It is to be expected that the result of any arithmetic procedure (any algorithm) involving a set of numbers will have an implicit error relating to the error of the original numbers. One says that the initial errors propagate through the computation. In addition, errors may he generated at each step in an algorithm, and one speaks of the total cumulative error at any step as the accumulated error. Since one ains to produce results within some chosen limit of error, it is useful to consider error propagation. Roughly speaking, based on experience, the propagated error depends on the mathematical algorithm chosen, whereas the generated error is more sensitive to the actual ordering of the computational steps. It is possible to be more precise, as described below. Absolute error The absolute error is the absolute difference between the exact number x and the approximate number X, i.e.,

A number correct to n decimal places has

We expect that the absolute error involved in any approximate number is no more than five units at the first neglected digit. Relative error This error is the ratio of the absolute error to the absolute exact number, i.e.,

(Note that the upper bound follows from the triangle inequality; thus

A decimal number correct to n significant digits has

Error propagation Consider two numbers

Under the operations of addition or subtraction,

The magnitude of the propagated error is therefore not more than the sum of the initial absolute enors; of course, it may be zero. Under the operation of multiplication:

The maximum relative error propagated is approximately the sum of the initial relative errors. The same result is obtained when the operation is division. Error generation Often (for example, in a computer) an operation ‫נ‬is approximated by an operation sup>*, say. Consequently, x is represented by x* *. Indeed, one has

so that the accumulated enor does not exceed the sum of the propagated and generated errors. Examples may be found in Step 4. Example Evaluate (as accurately as possible) the expressions: 1. 3.45+4.87-5.16 2. 3.55 x 2.73 There are two methods which the student may consider: The first is to invoke the concepts of absolute and relative error as defined above. Thus, the result for 1. is 3.16 +/- 0.015, since the maximum absolute error is 0.005 + 0.005 + 0.005 = 0.015. We conclude that the answer is 3 (to 1S ), for the number certainly lies between 3.145 and 3.175. In 2. , the product 9.6915 is subject to the maximum relative error:

whence the maximum (absolute) error ~ (2.73 + 3.55) x 0.005 ~ 0.03, so that the answer is 9.7. A second approach is interval arithmetic. Thus, the approximate number 3.45 represents a number in the interval (3.445, 3.455), etc. Consequently, the result for 1. lies in the interval bounded below by

and above by

Similarly, in 2. , the result lies in the interval bounded below by

and above by , whence once again the approximate numbers 3 and 9.7 correctly represent the respective results to 1. and 2. . Checkpoint 1. What distinguishes propagated and generated errors? 2. How to determine the propagated error for the operations addition (subtraction) and multiplication (division)? EXERCISES

Evaluate the following operations as accurately as possible, assuming all values to the number of digits given: 1. 8.24 + 5.33. 2. 124.53 - 124.52. 3. 4.27 x 3.13. 4. 9.48 x 0.513 - 6.72. 5. 0.25 x 2.84/0.64. 6. 1.73 - 2.16 + 0.08 + 1.00 - 2.23 - 0.97 + 3.02. STEP 4 ERRORS 4 Interval arithmetic In STEP 2, floating point representation was introduced as a convenient way of dealing with large or small numbers. Since most scientific computations involve such numbers, many students will be familiar with floating point arithmetic and will appreciate the way in which it facilitates calculations involving multiplication or division. In order to investigate the implications of finite number representation, one must examine the way in which arithmetic is carried out with floating point numbers. The following specifications apply to most computers which round, and are easily adapted to those which chop. For the sake of simplicity in the examples, we will use a three-digit decimal mantissa normalized to lie in the range

(most computers use binary representation and the mantissa is commonly normalized to lie in the rangc [?,1]). Note that up to six

digits are used for intermediate results, but the final result of each operation is a normalized three-digit decimal floating point number. Addition and subtraction Mantissae are added or subtracted (after shifting the mantissa and increasing the exponent of the smaller number, if necessary, to make the exponents agree); the final normalized result is obtained by rounding (after shifting the mantissa and adjusting the exponent, if necessary). Thus: 3.12 x 101 + 4.26 x l01 = 7.38 x 101

2.77 x 102 + 7.55 x 102 = 10.32 x 102  1.03 x 103

6.18 x l01 + 1.84 x l0-1 = 6.18 x 101 + 0.0184 x 101 = 6.1984 x 101  6.20 x 101 , 3.65 x 10-1 - 2.78 x 10-1 = 0.87 x 10-1  8.70 x 10-2.

The exponents are added and the mantissae are multiplied; the final result is obtained by rounding (after shifting the mantissa right and increasing the exponent by 1, if necessary). Thus: (4.27 x 101) x (3.68 x 101) = 15.7136 x 102  1.57x103 (2.73x102)x(-3.64x10-2)=-9.9372x100  -9.94x100.

Division The exponents are subtracted and the mantissae are divided; the final result is obtained by rounding (after shifting the mantissa left and reducing the exponent by 1, if necessary). Thus: (5.43xl01) / (4.55x102) = 1.19340...xl0-1  1.19x10-1

(-2.75x102) / (9.87x10-2) = -0.278622. . .x104  -2.79x103.

Expressions The order of evaluation is determined in a standard way and the result of each operation is a normalized floating point number. Thus: (6.18x101+1.84xl0-1)/((4.27x101)x(3.68x101))

(6.20x101)/(1.57x103)=3.94904...x10-2 3.95x10-2 Generated error Note that all the above examples (except the subtraction and the first addition) involve generated errors which are relatively large due to the small number of digits in the mantissae. Thus the generated error in 2.77x102+7.55x102=10.32x102  1.03x103 is 0.002 x 103. Since the propagated error in this example may be as large as 0.01 x 102 (assuming the operands are correct to 3S ), one can use the result given in Error propagation to deduce that the accumulated error cannot exceed 0.002x103 + 0.01x102 = 0.003x103.. Consequences The peculiarities of floating point arithmetic lead to some unexpected and unfortunate consequences, including the following: 1. Addition or subtraction of a small (but nonzero) number may have no effect, for example

5.18x102 + 4.37x10-1 = 5.18x102 + 0.00437x102 = 5.18437x102  5.18x102,

whence, the additive identity is not unique. 2. Frequently, the result of ax(1/a) is not 1; for example, if a = 3.00x100, then 1/a is 3.33x10-1 and

a (1/a) is 9.99 10-1, whence the multiplicative inverse may not exist. 3. The result of (a + b) + c is not always the same as the result of a + (b + c); for example, if

a = 6.31x101, b = 4.24x100, c = 2.47x10-1, then (a+b)+c = (6.31x101 + 0.424x101) + 2.47x10-1  6.73x101 + .0247x101  6.75x101, whereas (a+b)+c = 6.31x101 + (4.24x100 + 2.47x100)  6.31x101 + 4.49x100  6.31x101+4.49x100  6.31x101+ 0.449x101  6.76x101,

whence the associative law for addition does not always apply.

Examples involving adding many numbers of varying size indicate that adding in order of increasing magnitude is preferable to adding in the reverse order. 4. Subtracting a number from another nearly equal number may result in loss of significance or cancellation errors. In order to illustrate this loss of accuracy, suppose that we evaluate f(x) = 1 - cos x for x=0.05, using threedigit, decimal, normalized, floating point arithmetic with rounding. Then 1 - cos(0.05) = 1-0.99875  1.00x100 - 0.999x100  1.00x10-3. Although the value of 1 is exact and cos(0.05) is correct to 3S, when expressed as a three-digit floating point number, their computed difference is correct to only 1S ! (The two zeros after the decimal point in 1.00x10-3 pad the number.) The approximation 0.999 ~ cos(0.05) has a relative error of about 2.5x10-4. By comparison, the relative error of 1.00x10-3 - cos(0.05) is about 0.2, i.e., it is much larger. Thus, subtraction of two nearly equal numbers should be avoided whenever possible. In the case of f(x)= 1 - cos x, one can avoid loss of significant digits by writing

This last formula is more suitable for calculations when x is close to 0. It can be verified that the more accurate

approximation of 1.25 x 10-3 is obtained for 1- cos(0.05) when three-digit floating point arithmetic is used. 5. Checkpoint Why is it sometimes necessary to shift the mantissa and adjust the exponent of a floating point number? 6. Does floating point arithmetic obey the usual laws of arithmetic? 7. Why should the subtraction of two nearly equal numbers be avoided? 8. Exercises Evaluate the following expressions, using three-digit decimal normalized floating point arithmetic with rounding: 1. 6.19x102+5.82x102, 2. 6.19x102+3.61x101, 3. 6.19x102-5.82x102, 4. 6.19x 102-3.61x101, 5. (3.60 x 103)x(1.01x10-1, 6. (-7.50x10-1x(-4.44x101, 7. (6.45x102/(5.16xl0-1, 8. (-2.86 x 10-2)/(3.29 x 103). 9. Estimate the accumulated errors in the results of Exercise 1, assuming that all values are correct to 3S. 10. Evaluate the following expressions, using four-digit decimal normalized floating point arithmetic with rounding, then recalculate them, carrying all decimal places, and estimate the propagated error.

1. Given a = 6.842x10-1, b = 5.685x101, c = 5.641x101, find a(b-c) and ab-ac. 2. Given a=9.812xl01, b=4.631x+l0-1, c=8.340xl0-1, find (a+b)+c and a+(b+c). 3. Use four-digit decimal normalized floating point arithmetic with rounding to calculate f (x)= tanx - sinx for x = 0.1. Since tanx-sinx=tanx(1-cosx)=tanx(2sin2(x/2))

f(x) may be written as f (x)=2tanxsin2(x/2). Repeat the calculation using this alternative expression. Which of the two values is more accurate?

STEP 5 ERRORS Approximation to functions An important procedure in Analysis is to represent a given function as an infinite series of terms involving simpler or otherwise more appropriate functions. Thus, if f is the given function, it may be represented as the series expansion

involving the set of functions ( i). Mathematicians have spent a lot of effort in discussing the convergence of series, i.e., on defining conditions for which the partial sum

approximates the function value f (x) ever more closely as n increases. In Numerical Analysis, we are primarily concerned with such convergent series; computation of the sequence of partial sums is an approximation process in which the truncation error may be made as small as we please by taking a sufficient number of terms into account. 1. Taylor series The most important expansion to represent a function is the Taylor series. If f is suitably smooth in the neighbourhood of some chosen point x0, , where

; here h = x - x0 denotes the displacement from x0 to the point x in the neighbourhood, and the remainder term is

for some point

between x0 and x. (This is known as the

Lagrange form of the remainder; its derivation may be found in Section 8.7 of Thomas and Finney ( 1992), cited in the

Bibliography.) Note that the

in this expression for Rn may be

written as

The Taylor expansion converges for x within some range including the point x0, a range which lies within the neighbourhood of x0 mentioned above. Within this range of convergence, the truncatiorr error due to discarding terms after the xn term (equal to the value of Rn at the point x) can be made smaller in magnitude than any positive constant by choosing n sufficiently large. In other words, by using Rn to decide how many terms are needed, one may evaluate a function at any point in the range of convergence as accurately as the accumulation of round- off error permits. From the point of view of the numerical analyst, it is most important that the convergence be fast enough. For example, if we consider f (x) = sin x, we have

and the expansion (about x0 = 0) for n = 2k - 1 is given by

with

.

Note that this expansion has only odd-powered terms, although the polynomial approximation is of degree (2k - 1 ) - it has only k terms. Moreover, the absence of even-powered terms means that the same polynomial approximation is obtained with n = 2k, and hence R2k-1 = R2k; the remainder term R2k - 1 given above is actually the expression for R2k. Since

then

, if 5D accuracy is required, it follows that we need only take k = 2 at x = 0.1, and k = 4 at x=1 (since 9! = 362 880). On the other hand, the expansion for the natural (base e) logarithm,

, where

is less suitable. Although only n = 4 terms are needed to give 5D accuracy at x = 0.1, n = 13 is required for 5D accuracy at x = 0.5, and n = 19 gives just 1D accuracy at x = 1! Moreover, we remark that the Taylor series is not only used extensively to represent functions numerically, but also to analyse the errors involved in various algorithms (for example, STEP 8, Step9, Step10, Step30, and Step31) Polynomial approximation

The Taylor series provides a simple method of polynomial approximation (of chosen degree n)

which is basic to the later discussion of various elementary numerical procedures. Because f is often complicated, one may prefer to execute operations such as differentiation and integration of a polynomial approximation. Interpolation formulae in STEP22 and STEP23 may also be used to construct polynomial approximations.

2. Other series expansions There are many other series expansions, such as the Fourier series (in terms of sines and cosines), or those involving various orthogonal functions (Legendre polynomials, Chebyshev polynomials, Bessel functions, etc.). From the numerical standpoint, truncated Fourier series and Chebyshev polynomial series have proven to be most useful. Fourier series are appropriate in treating functions with natural periodicities, while Chebyshev series provide the most rapid convergence of all known polynomial based approximations. Occasionally, it is possible to represent a function adequately (from the numerical standpoint) by truncating a series which does not converge in the mathematical sense. For example, solutions are sometimes obtained in the form of asymptotic series with leading terms which provide sufficiently accurate numerical results.

While we confine our attention in this book to truncated Taylor series, the interested reader should be aware that such alternative expansions exist (see, for example, Burden and Faires (1993)). 3. Recursive procedures While a truncated series with few terms may be a practical way to compute values of a function, there is a number of arithmetic operations involved, and it may be preferable to employ possibly available recursive procedures which reduce the arithmetic required. For example, the values of the polynomial

and its derivative

for

may be generated recursively by Horner's scheme:

Thus, for successive values of k, one has

The technique just described is also known as nested multiplication. (Perhaps the student may want to suggest a recursive procedure for higher derivatives of P.) Finally, it should be noted that it is common to generate members of a set of orthogonal functions recursively. Checkpoint 1. How do numerical analysts use the remainder term Rn in Taylor series? 2. Why is the rate of convergence so important from the numerical standpoint? 3. From the numerical standpoint, is it essential for a series representation to converge in the mathematical sense? EXERCISES 1. Find the Taylor series expansions about x = 0 for each of the functions: 1. cosx. 2. 1/(1 - x). 3. Determine also for each series a general remainder term.

2. For each of the functions in 1. evaluate f(0.5), using a calculator and the first four terms of the Taylor expansions. 3. Use the remainder term found in 1. 3 to find the value of n required in the Taylor series for f(x)=ex about x = 0 to give 5D accuracy for all x between 0 and 1. 4. Truncate the Taylor series found in 1. 3 to give linear, quadratic, and cubic polynomial approximations for f(x) = ex in the neighbourhood of x = 0. Use the remainder term to estimate (to the nearest 0.1) the range over which each polynomial approximation yields results correct to 2D. 5. Evaluate P(3.1) and P'(3.1), where P(x) = x3 - 2x2 + 2x + 3, using the technique of nested multiplication. 6. Evaluate P(2.6) and P'(2.6), where P(x) = 2x4 - x3 + 3x2 + 5, using nested multiplication. Check your values using a calculator. STEP 6 Approximation to functions Non- linear equations 1 Non- linear algebraic and transcendental equations The first non-linear equation encountered in algebra courses is usually the quadratic equation

and all students will be familiar with the formula for its roots:

The formula for the roots of a general cubic is somewhat more complicated and that for a general quartic usually takes several pages to describe! We are spared further effort by a theorem which states that there is no such formula for general polynomials of degree higher than four. Accordingly, except in special cases (for example, when factorization is easy), we prefer in practice to use a numerical method to solve polynomial equations of degree higher than two. Another class of nonlinear equations consists of those which involve transcendental functions such as . Useful analytic solutions of such equations are rare so that we are usually forced to use numerical methods. 1. A transcendental equation We shall use a simple mathematical problem to show that transcendental equations do arise quite naturally. Consider the height of a liquid in a cylindrical tank of radius r and horizontal axis, when the tank is a quarter full (see Figure 2). Denote the height of the liquid by h (DB in the diagram). The condition to be satisfied is that the area of the segment ABC should be ? of the area of the circle. This task reduces to , where

is the area of the sector OAB, the triangle OAD. Hence

.

When we have solved the transcendental equation

we obtain h from

FIGURE 2. Cylindrical tank (cross-section). 2. Locating roots Let it be required to find some or all of the roots of the nonlinear

f(x) = 0. Before we use a numerical method (STEP 7, STEP 8, STEP 9 and STEP 10), we should have some idea about the number, nature and approximate location of the roots. The usual approach involves the construction of graphs and perhaps a table of values of the function f, in order to confirm the information obtained from the graph. We will now illustrate this approach by a few examples. a) If we do not have a calculator or computer available to immediately plot the graph of

f(x )= sin x - x + 0.5, we can separate f into two parts, sketch two curves on a single set of axes, and find out whether they intersect. Thus we sketch . Since

, we are only interested in the

interval -0.5  x  1.5 (outside which |x - 0.5| > 1). Thus we deduce from Fig. 3 that the equation has only one real root, near x =1.5 as follows:

We now know that the root lies between 1.49 and 1.50, and we can use a numerical method to obtain a more accurate answer as is discussed in tlater Steps. b) Again, we sketch two curves:

In order to sketch the second curve, we use the three obvious zeros at x = 0, 2, and 3, as well as the knowledge that x(x - 2) (x 3) is negative for x < 0 and 2 < x < 3, but positive and increasing steadily for x > 3. We deduce from the graph (Fig. 4) that there are three real roots, near x = 0.2, 1.8, and 3. 1, and tabulate as follows (with

:

We conclude that the roots lie between 0.15 and 0.2, 1.6 and 1.8, and 3.1 and 3.2, respectively. Note that the values in the table were calculated to an accuracy of at least 5SD. For example,

working to 5S accuracy, we have f (0.15) = 0.970450.79088= 0.17957, which is then rounded to 0.1796. Thus the entry in the table for

f(0.15) is 0.1796 and not 0.1795 as one might expect from calculating 0.9704 - 0.7909. Checkpoint 1. Why are numerical methods used in solving nonlinear equations? 2. How does a transcendental equation differ from an algebraic equation? 3. What kind of information is used when sketching curves for the location of roots? EXERCISES 4. Locate the roots of the equation x+cos x=0. 5. Use curve sketching to roughly locate all the roots of the equations: a) x + 2 cos x = 0. b) x + ex= 0. c) x(x - 1) - ex= 0.

d) x(x - 1 - sin x = 0. STEP 7

NONLINEAR EQUATIONS 2 The bisection method The bisection method, suitable for implementation on a computer (cf. the Pseudo-code) allows to find the roots of the equation f (x) = 0, based on the following theorem: Theorem: If f is continuous for x between a and b and if f (a) and f(b) have opposite signs, then there exists at least one real root of f (x) = 0 between a and b. 1. Procedure: Suppose that a continuous function f is negative at x

= a and positive at x = b, so that there is at least one real root between a and b. (As a rule, a and b may be found from a graph of f.) If we calculate f

((a +b)/2), which is the function value at the point of bisection of the interval a< x < b, there are three possibilities: 1. f ((a + b)/2) = 0, in which case (a + b)/2 is the root; 2. f ((a + b)/2) <0, in which case the root lies between (a +

b)/2 and b; 3. f ((a + b)/2) > 0, in which case the root lies between a and (a + b)/2. Presuming there is just one root, in Case 1 the process is terminated. In either Case 2 or Case 3, the process of bisection of the interval containing the root can be repeated until the root is obtained to the desired accuracy. In Figure 5, the successive points of bisection are denoted by x1 , x2, and x3.

2. Effectiveness: The bisection method is almost certain to give a

root. Provided the conditions of the above theorem hold; it can only fail if the accumulated error in the calculation of f at a bisection point gives it a small negative value when actually it should have a small positive value (or vice versa); the interval subsequently chosen would then be wrong. This can be overcome by working to sufficient accuracy, and this almostassured convergence is not true of many other methods of finding roots. One drawback of the bisection method is that it applies only to roots of f about which f (x) changes sign. In particular, double roots can be overlooked; one should be careful to examine f(x) in any range where it is small, so that repeated roots about which f (x) does not change sign are otherwise evaluated (for example, see Steps 9 and 10). Of course, such a close examination also avoids another nearby root being overlooked. Finally, note that bisection is rather slow; after n iterations the interval containing the root is of length (b - a)/2n. However, provided values of f can be generated readily, as when a

computer is used, the rather large number of iterations which can be involved in the application of bisection, is of relatively little consequence.

3. Example Solve 3xex = 1 to three decimal places by the bisection method. Cconsider f(x) = 3x - ex, which changes sign in the interval 0.25 < x < 0.27: one tabulates (working to 4D ) as follows:

(Ascertain graphically that there is just one root!) Denote the lower and upper endpoints of the interval bracketing the root at the n-th iteration by an and bn, respectively (with a1 = 0.25 and b1 = 0.27). Then the approximation to the root at the nth iteration is given by xn = an + bn)/2. Since the root is either in [an, bn] or [xn, bn] and both intervals are of length bn - an)/2, we see that xn will be accurate to three decimal places when (bn an)/2 < 5 10-4. Proceeding to bisection:

(Note that the values in the table are displayed to only 4D.) Hence the root accurate to three decimal places is 0.258. Checkpoint

4. When may the bisection method be used to find a root of the

equation f(x) = 0? 5. What are the three possible choices after a bisection value is calculated? 6. What is the maximum error after n iterations of the bisection method`? EXERCISES 1. Use the bisection method to find the root of the equation

x+cosx = 0. correct to two decimal places (2D ). 2. Use the bisection method to find to 3D the positive root of the

equation x - 0.2sinx - 0.5=0. 3. Each equation in Exercises 2(a)-2(c) of Step 6 has only one root.

For each equation use the bisection method to find the root correct to 2 D. STEP 8 NONLINEAR EQUATIONS 3 Method of false position As mentioned in the Prologue, the method of false position dates back to the ancient Egyptians. It remains an effective alternative to the bisection method for solving the equation f(x) = 0 for a real root between a and b, given that f (x) is continuous and f (a) and f(b) have

opposite signs. The algorithm is suitable for automatic computation (pseudo.code) 1. PROCEDURE The curve y = f(x) is not generally a straight line. However, one may join the points (a,f(a)) and (b,f(b)) by the straight line

Thus straight line cuts the x-axis at (X, 0) where

so that

Suppose that f(a) is negative and f(b) is positive. As in the bisection method, there are the three possibilities : 1. f(X) = 0, when case X is the root ; 2. f(X) < 0, when the root lies between X and b ; 3. f(X)>0, when the root lies between X and a. Again, in Case 1, the process is terminated, in either Case 2 or Case 3, the process can be repeated until the root is obtained to the desired accuracy. In Fig. 6, the successive points where the straight lines cut the axis are denoted by x1, x2, x3.

2. EFFECTIVENESS AND THE SECANT METHOD Like the bisection method, the method of false position has almost assured convergence, and it may converge to a root faster. However, it may happen that most or all the calculated values of X are on the same side of the root, in which case convergence may be slow (Fig. 7). This is avoided in the secant method, which resembles the method of false position except that no attempt is made to ensure that the root is enclosed; starting with two approximations to the root (x0,

x1), further approximations x2, x3,… are computed from

There is no longer convergence, but the process is simpler (the sign of

f(xn+1) is not tested) and often converges faster.

With respect to speed of convergence of the secant method, one has at the (n+1)th step:

Hence, expanding in terms of the Taylor series,

where we have used the fact that f( )=0. Thus we see that en+1 is proportional to enen-1, which may be expressed in mathematical notation as

We seek k such that

Hence the speed of convergence is faster than linear (k =1 ), but slower than quadratic (k=2). This rate of convergence is sometimes referred to as superlinear convergence. 3. EXAMPLE Use the method of false position to solve

Then

f (x1) =f (0.257637) = 3 x 0.257637 0 0.772875 = 0.772912 - 0.772875 = 0.000036. The student may verify that doing one more iteration of the method of false position yields an estimate x2 = 0.257628 for which the function value is less than 5*10-6. Since x1 and x2 agree to 4D, we conclude that the root is 0.2576, correct to 4D. Checkpoint 1. When may the method of false position be used to find a root of the equation f(x) = 0? 2. On what geometric construction is the method of false position based? EXERCISES

1. Use the method of false position to find the smallest root of the equation f (x) = 2 sin x + x - 2 = 0, stopping when

2. Compare the results obtained when you use 1. the bisection method, 2. the method of false position, and 3. the secant method with starting values 0.7 and 0.9 to solve the equation 3sin x = x + 1/x. 4. Use the method of false position to find the root of the equation

5. Each equation in Exercises 2(a)-2(c) of Step 6 has only one root. Use the method of false position to find each root stopping when

STEP 9 NONLINEAR EQUATIONS The method of simple iteration

The method of simple iteration involves writing the equation f(x) = 0 in a form x = f(x), suitable for the construction of a sequence of approximations to some root in a repetitive fashion. 1. Procedure The iteration procedure follows: In some way, we obtain a rough approximation x0 of the desired root, which may then be substituted into the right-hand side to give a new approximation, x1= (x0). The new approximation is again substituted into the right-hand side to give a further approximation x2= (x1), etc., until (hopefully) a sufficiently accurate approximation to the root is obtained. This repetitive process, based on xn+1 = (xn) is called simple iteration; provided that |xn+1 - xn| decreases as n increases, the process tends to = ( ), where

denotes the root. 2. Example

Use simple iteration to find the root of the equation 3xex = 1 to an accuracy of 4D. One first writes

x = e-x/3 = (x). Assuming x0 = 1, successive iterations yield

x1 = 0.12263, x2 = 0.29486, x3 = 0.24821, x4 = 0.26007, x5 = 0.25700, x6 = 0.25779, x7 = 0.25759, x8 = 0.25764. Thus, we see that after eight iterations the root is 0.2576 to 4D. A graphical interpretation of the first three iterations is shown in Fig. 8. 3. Convergence Whether or not an iteration procedure converges or indeed at all, depends on the choice of the function (x) as well as the starting value

x0. For example, the equation x? = 3 has two real roots It can be given the form

x = 3/x = (x) which suggests the iteration

xn+1 = 3/xn. However, if the starting value x0 = 1 is used, the successive iterations yield

We can examine the convergence of the iteration process

x=

(xn) to 

( )

with the aid of the Taylor series

where

k

is a point between the root and the approximation xk. We have

Multiplying the n + 1 rows together and cancelling the common factors

x1, x2, ??? , xn leaves

whence

so that the absolute error | xn+1| can be made as small as we please by sufficient iteration if | '|< 1 in the neighbourhood of the root. Note that (x) = 3/x has derivative | '(x)| = |-3/x?| > 1 for |x| < 3?. Checkpoint

1. Assuming x0 = 1, show by simple iteration that one root of the equation 2x - 1 -2sinx = 0 is 1.4973. 2. Use simple iteration to find (to 4D) the root of the equation x + cos

x = 0. STEP 10 NONLINEAR EQUATIONS 5 The Newton- Raphson iterative method The Newton- Raphson method is suitable for implementation on a computer (pseudo-code). It is a process for the determination of a real root of an equation f (x) = 0, given just one point close to the desired root. It can be viewed as a limiting case of the secant method (Step 8) or as a special case of simple iteration ( Step 9). 1. Procedure Let x0 denote the known approximate value of the root of f(x) = 0 and h the difference between the true value

and the approximate value, i.e.,

The second degree, terminated Taylor expansion ( STEP 5) about x0 is

where

lies between

Ignoring the remainder term and writing

whence

and, consequently,

should be a better estimate of the root than x0. Even better approximations may be obtained by repetition (iteration) of the process, which then becomes

Note that if f is a polynomial, you can use the recursive procedure of STEP 5 to compute The geometrical interpretation is that each iteration provides the point at which the tangent at the original point cuts the x-axis (Figure 9). Thus the equation of the tangent at (xn, f (xn)) is

y - f(x0) = f '(x0)(x - x0) so that (x1, 0) corresponds to

-f(x0) = f '(x0)(x1 - x0), whence x1 = x0 - f(x0)/f '(x0). 2. Example

We will use the Newton- Raphson method to find the positive root of the equation sin x = x2, correct to 3D. It will be convenient to use the method of false position to obtain an initial approximation. Tabulation yields

With numbers displayed to 4D, we see that there is a root in the interval 0.75 < x < 1 at approximately

Next, we will use the Newton- Raphson method:

and

yielding

Consequently, a better approximation is

Repeating this step, we obtain

and

so that

Since f(x2) = 0.0000, we conclude that the root is 0.877 to 3D. 3. Convergence If we write

, the Newton-Raphson iteration expression

may be rewritten

We have observed (STEP 9) that, in general, the iteration method converges when

near the root. In the case of Newton-Raphson,

we have

, so that the criterion for convergence is

, i.e., convergence is not as assured as, say, for the bisection method. 4. Rate of convergence The second degree terminated Taylor expansion about xn is

where Since

is the error at the n-th iteration and

.

, we find

But, by the Newton-Raphson formula,

whence the error at the (n + 1)-th iteration is

provided en is sufficiently small. This result states that the error at the (n + 1)-th iteration is proportional to the square of the error at the nth iteration; hence, if

, an

answer correct to one decimal place at one iteration should be accurate to two places at the next iteration, four at the next, eight at the next, etc. This quadratic - second- order convergence - outstrips

the rate of convergence of the methods of bisection and false position! In relatively little used computer programs, it may be wise to prefer the methods of bisection or false position, since convergence is virtually assured. However, for hand calculations or for computer routines in constant use, the Newton-Raphson method is usually preferred. 5. The square root One application of the Newton- Raphson method is in the computation of square roots. Since a? is equivalent to finding the positive root of x2 = a. i.e.,

f(x) = x2 - a = 0. Since f '(x) = 2x, we have the Newton-Raphson iteration formula:

xn+1 = xn - (x?n - a)/2xn = ?(xn + a/xn), a formula known to the ancient Greeks. Thus, if a = 16 and x0 = 5, we find to 3D

x1 = (5 + 3.2)/2 = 4.1, x2 = (4.1 + 3.9022)/2 = 4.0012, and x3 = (4.0012 + 3.9988)/2 = 4.0000. Checkpoint 1. What is the geometrical interpretation of the NewtonRaphson iterative procedure? 2. What is the convergence criterion for the Newton-Raphson method?

3. What major advantage has the Newton-Raphson method over some other methods? EXERCISES 1. Use the Newton-Raphson method to find to 4S the (positive) root of 3xex=1? Note that Tables of natural logarithms ( Naperian) are more more readily available than tables of the exponential, so that one might prefer to solve the equivalent equation f(x) = loge 3x + x = log 3 + loge x + x = 0. 2.Derive the Newton-Raphson iteration formula

xn + 1 = xn - (xkn - a)/k xk-1n for finding the k-th root of a. 3. Compute the square root of 10 to 5 significant digits from an initial guess. 4. Use the Newton-Raphson method to find to 4D the root of the equation

x cos x = 0. Use the Newton-Raphson method to find to 4D the root of each equation in the exercises 1 - 3. STEP 11 SYSTEMS OF LINEAR EQUATIONS 1

Solution by elimination Many physical systems can be modelled by a set of linear equations which describe relationships between system variables. In simple cases, there are two or three variables; in complex systems (for example, in a linear model of the economy of a country) there may be several hundred variables. Linear systems also arise in connection with many problems of numerical analysis. Examples of these are the solution of partial differential equations by finite difference methods, statistical regression analysis, and the solution of eigenvalue problems (see, for example, Gerald and Wheatley ( 1994)). A brief introduction to this last topic is given in STEP 17. Hence there arises a need for rapid and accurate methods for solving systems of linear equations. The student will already be familiar with solving by elimination systems of equations with two or three variables. This Step presents a formal description of the Gauss elimination method for n-variable systems and discusses certain errors which might arise in their solutions. Partial pivoting, a technique which enhances the accuracy of this method, is discussed in. in the next Step. 1. Notation and definitions We first consider an example in three variables:

, a set of three linear equations in the three variables or unknowns x, y,

z. During solution of such a system, we determine a set of values for x,

y and z which satisfies each of the equations. In other words, if values (X, Y, Z) satisfy al1 equations simultaneously, then they constitute a solution of the system. Consider now the general system of n equations in n variables:

Obviously, the dots indicate similar terms in the variables and the remaining (n - 3) equations. In this notation, the variables are denoted by x1, x2, . . , xn; they are sometimes referred to as xi , i = 1, 2, ? ?. , n. The coefficients of the variables may be detached and written in a coefficient matrix:

The notation aij will be used to denote the coefficient of xj in the i-th equation. Note that aij occurs in the i-th row and j-th column of the matrix. The numbers on the right-hand side of the equations are called constants; they may be written in a column vector:

The coefficient matrix may be combined with the constant vector to form the augmented matrix:

As a rule, one works in the elimination method directly with the augmented matrix. 2. The existence of solutions For any particular solution of n linear equations, there may be a single solution (X1, X2, . . . Xn), or no solution, or an infinity of solutions. In the theory of linear algebra, theorems are given and conditions stated which allow to make a decision regarding the category into which a given system falls. We shall not treat the question of existence of solutions in this book, but for the benefit of students, familiar with matrices and determinants, we state the theorem: Theorem: A linear system of n equations in n variables with coefficient matrix A and non-zero constants vector b has a unique solution, if and only if the determinant of A is not zero. If b = 0 , the system has the trivial solution x = 0 . It has no other solution unless the determinant of A is zero, when it has an infinite number of solutions. If the determinant of A is non-zero, there exists an n n matrix, called

the inverse of A (denoted by A-1) such that the matrix product of A-1

and A is equal to the n  n-identity or unit matrix I. The elements of the identity matrix are 1 on the main diagonal and 0 elsewhere. Its

algebraic properties include Ix = x for any n1 vector x, and IM = MI = M for any nn matrix M. For example, the 33 identity matrix is

Multiplication of the equation Ax = b from the left by the inverse matrix A-1 yields A-1Ax = A-1b, whence the unique solution is x = A-1b (since A1

A = I and Ix = x). Thus, in principle, a linear system with a unique

solution may be solved by first evaluating A-1 and then A-1b. This approach is discussed in more detail in the optional Step 14. The Gauss elimination method is a more general and efficient direct procedure for solving systems of linear equations. 3. Gauss elimination method In Gauss elimination, the given system of equations is transformed into an equivalent system which has upper triangular form; this form is readily solved by a process called back- substitution. We shall demonstrate the process by solving the example of Section 1. Transformation into upper triangular form:

First stage: Eliminate x from equations R2 and R3 using equation R1.

Second stage: Eliminate y from R3' using R2':

The system, now in upper triangular form, has the coefficient matrix:

Solution by back- substitution The system in upper triangular form is easily solved by obtaining z from R3", then y from R2", and finally x from R1". This procedure is called back- substitution. Thus:

4. The transformation operations During transformation of a system to upper triangular form, one or more of the following elementary operations are used at every step: 1. Multiplication of an equation by a constant; 2. Subtraction from one equation some multiple of another equation; 3. Interchange of two equations. Mathematically speaking, it should be clear to the student that performing elementary operations on a system of linear equations leads to equivalent systems with the same solutions. This statement requires proof which may be found as a theorem in books on linear algebra (cf., for example, Anton (1993)). It forms the basis of all elimination methods for solving systems of linear equations. 5. General treatment of the elimination process

We will now describe the application of the elimination process to a general n n linear system, written in general notation, which is suitable for implementation on a computer (Pseudo-code). Before considering the general system, the process will be described by means of a system of three equations. We begin with the augmented matrix, and display in the column (headed by m) the multipliers required for the transformations.

First stage: Eliminate the coefficients a21 and a31, using row R1:

Second stage: Eliminate the coefficient a32 using row R2:

The matrix is now in the form which permits back- substitution. The full system of equations at this stage, equivalent to the original system, is

Hence follows the solution by back- substitution:

We now display the process for the general n  n system, omitting the primes (') for convenience. Recall that the original augmented matrix was:

First stage: Eliminate the coefficients a21, a31,. . . , an1 by calculating the multipliers

and then

This leads to the modified augmented system

Second stage: Eliminate the coefficients a32, a42, . . . , an2 by calculating the multipliers

and then

whence

We continue to eliminate unknowns, going on to columns 3, 4, . . so that by the beginning of the k-th stage we have the augmented matrix

k- th stage: Eliminate ak+l,k, ak+2,k, . . , ak+n,kl by calculating the multipliers

and then

At the end of the k-th stage, we obtain the augmented system

Continuing in this way, we obtain after n -1 stages the augmented matrix

Note that the original coefficient matrix has been transformed into the upper triangular form. We now back- substitute: We find xn = bn/ann, and then

Notes 1. The diagonal elements akk, used at the k-th stage of the successive elimination, are referred to as pivot elements . 2. In order to proceed from one stage to the next, the pivot elements must be non- zero, since they are used as divisors in the multipliers and in the final solution. If at any stage a pivot element vanishes, rearrange the remaining rows of the matrix, in order to obtain a nonzero pivot; if this is not possible, then the system of linear equations has no solution. 3. If a pivot element is small compared with the elements in its column which have to be eliminated, the corresponding multipliers used at that stage will be larger than one in magnitude. The use of large multipliers in elimination and back-substitutions leads to magnification of round- off errors; this can be avoided by using the partial pivoting described in Step12.

4. During the elimination method, an extra check column, containing the sums of the rows, should be computed at each stage (cf. the following example). Its elements are treated in exactly the same way as the equation coefficients. After each stage is completed, the new row sums should equal the new check column elements ( within roundoff error). 6. A numerical example with check column Consider the system of equations: 0.34x1 - 0.58x2 + 0.94x3 = 2.0 0.27x1 + 0.42x2 + 0.134x3 = 1.5 0.20x1 - 0.51x2 + 0.54x3 = 0.8 The manipulations leading to the solution are set out in tabular form below. For the purpose of illustration, the calculations have been executed in 3D floating point arithmetic. For example, at the first stage, the multiplier 0.794 arises as follows:

while the value -0.0900 arises from the sequence of operations

Work with so few significant digits leads to errors in the solution, as is shown below by an examination of the so called residuals.

Next, we back-substitute:

As a check, we sum the original three equations to obtain 0.81x1 - 0.67x2

= 0.43. Inserting the solution yields 0.81 0.89- 0.67  2.1 + 1.61  3.03 = 4.3049.

In order to assess the accuracy of the solution, we insert the solution into the left-hand sides of each of the original equations and compare the results with the right-hand side constants. For the present example, we find:

which yields the residuals

It seems reasonable to conclude that, if the residuals are small, the solution is a good one. As a rule, this is true. However, at times, small

residuals do not indicate that a solution is acceptable. This aspect is discussed under the heading ill-conditioning in Step 12. Checkpoint 1. What types of operations are permissible during the transformation of the augmented matrix? 2. What is the final form of the coefficient matrix, before backsubstitution begins? 3. What are pivot elements? Why must small pivot elements be avoided if possible? EXERCISES Solve the following systems by Gauss elimination: a)

x1 + x2 - x3 = 0, 2x1 - x2 + x3 = 6, 3x1 + 2x2 - 4x3 = -4. b) 5.6 x + 3.8 y + 1.2z = 1.4, 3.lx + 7.ly - 4.7z = 5.1, 1.4x - 3.4y + 8.3z = 2.4. c) 2x + 6&127 + 4z = 5, 6x + 19y + 12z = 6, 2x + 8y + 14z = 7.

d) 1.3x + 4.6y + 3.lz= -1, 5.6x + 5.8y + 7.9z = 2, 4.2x + 3.2y + 4.5z = -3. STEP 12 SYSTEMS OF LINEAR EQUATIONS 2 Errors and ill- conditioning For any system of linear equations, the question concerning the errors in a solution obtained by a numerical method is not readily answered. A general discussion of the problems it raises is beyond the scope of this book. However, some sources of errors will be indicated below. 1. Errors in the coefficients and constants In many practical cases, the coefficients of the variables, and also the constants on the right-hand sides of the equations are obtained from observations of experiments or from other numerical calculations. They will have errors; and therefore, once the solution of a system has been found, it too will contain errors. In order to show how this kind of error is carried through calculations, we shall solve a simple example in two variables, assuming that the constants have errors at most as large as +/- 0.01. Consider the system: . Solving this system by Gauss elimination and back- substitution yields:

, whence 3/2 y lies between 2.985 and 3.015, i.e., y lies between 1.990 and 2.010. From the first equation, we now obtain , so that x lies between 0.99 and 1.01. If the coefficients and constants of the system were exact, its exact solution would be x = 1, y = 2. However, since the constants are not known exactly, it does not make sense to talk of an exact solution; all one can say is that 0.99  x  1.01 and 1.99  y  2.01.

In this example, the error in the solution is of the same order as that in the constants. Yet, in general, the errors in the solutions are greater than those in the constants. 2. Round- off errors and numbers of operations Numerical methods for solving systems of linear equations involve large numbers of arithmetic operations. For example, the Gauss elimination of Step 11, according to Atkinson (1993), involves (n3 + 3n2 - n)/3 multiplications/divisions and (2n3 + 3n2 - 5n)/6 additions/subtractions in the case of a system with n unknowns. Since round-off errors are propagated at each step of an algorithm, the growth of round-off errors can be such that, when n is large, a solution differs greatly from the true one. 3. Partial pivoting

In Gauss elimination, the buildup of round-off errors may be reduced by rearranging the equations so that the use of large multipliers in the elimination operations is avoided. The corresponding procedure is known as partial pivoting (or pivotal condensation). The general rule to follow involves: At each elimination stage, rearrange the rows of the augmented matrix so that the new pivot element is larger in absolute value than (or equal to) any element beneath it in its column. A use of this rule ensures that the magnitudes of the multipliers, employed at each stage, are less or equal to unity. A simple example in 3 floating point arithmetic follows:

. In the following tabular solution, the pivot elements are in bold print. (Note that the magnitude of all the multipliers is less than unity.)

. If no pivoting is done, it may be verified that using three-digit floating point arithmetic yields the solution z = 2.93, y = 2.00 and x = 1.30. Since the true solution to 3S is z = 2.93, y = 1.96, and x=1.36, the solution

obtained by partial pivoting is better than the one obtained without pivoting. 4. Ill-conditioning Certain systems of linear equations are such that their solutions are very sensitive to small changes (and therefore to errors) in their coefficients and constants. We give an example below in which 1 % changes in two coefficients change the solution by a factor of 10 or more. Such systems are said to be ill- conditioned. If a system is illconditioned, a solution obtained by a numerical method may differ greatly from the exact solution, even though great care is taken to keep round-off and other errors very small. As an example, consider the system of equations:

which has the exact solution x =1, y = 2. Changing coefficients of the second equation by 1% and the constant of the first equation by 5% yields the system:

It is easily verified that the exact solution of this system is x = 11, y = -18.2. This solution differs greatly from the solution of the first system. Both these systems are said to be ill- conditioned. If a system is ill-conditioned, then the usual procedure of checking a numerical solution by calculation of the residuals may not be valid. In order to see why this is so, suppose we have an approximation X to the true solution x. The vector of residuals r is then given by r = b - AX

= A( x - X) . Thus e = x - X satisfies the linear system Ae = r. In general, r will be a vector with small components. However, in an illconditioned system, even if the components of r are small so that it is `close' to 0, the solution of the linear system Ae = r could differ greatly from the solution of the system Ae = 0 , namely 0. It then follows that X may be a poor approximation to x despite the residuals in r being small. Obtaining accurate solutions to ill-conditioned linear systems can be difficult, and many tests have been proposed for determining whether or not a system is ill-conditioned. A simple introduction to this topic is given in the optional STEP 16. Checkpoint 1. Describe the types of error which may affect the solution of a system of linear equations. 2. How can partial pivoting contribute to a reduction of errors? 3. Is it true to say that an ill- conditioned system does not have an exact solution? EXERCISES 1. Find the range of solutions for the following system, assuming that the maximum errors in the constants are as shown:

2. Solve the following systems by Gauss elimination: a)

b)

c) 3. Use 4D normalized floating point arithmetic to solve the fo1lowing system with and without partial pivoting. Compare your answers with the exact answer: x =1.000 x 1O0, y = 5.000 x 10-1:

4. Show that for a linear system of three unknowns, Gauss elimination requires three divisions, eight multiplications, and eight sub- tractions for the triangularization, and a further three divisions, three multiplications and three additions/subtractions for the backsubstitution. 5. Derive the general formulae in Section 2 for the numbers of required arithmetic operations. 6. Study the ill- conditioning example of Section 4 in the following ways: a) Plot the lines of the first system on graph paper; then describe illconditioning in geometrical terms when only two unknowns are involved. b) Insert the solution of the first system into the left-hand side of the second system. Does x=1, y=2 `look like a good solution to the second system? Comment!

c) Insert the solution of the second system into the left-hand side of the first system. Comment! d) The system

is an example of ill-conditioning, due to T. S. Wilson. Insert the `solution' (6.0, -7.2, 2.9, -0.1 ) into the left-hand side. Would you claim this solution to be a good one? Next insert the solution (1.0,1.0,1.0,1.0). Comment on the dangers of laying claims! STEP 13 SYSTEMS OF LINEAR EQUATIONS 3 The Gauss- Seidel iterative method The methods used in lhe previous Steps for solving systems of linear equations are termed direct methods. If a direct method is used, and round-off and other errors do not arise, an exact solution is reached after a finite number of arithmetic operations. In general, of course, round-off errors do arise; and when large systems are being solved by direct methods, the growing errors can become so large as to render the results obtained quite unacceptable. 1. Iterative methods Iterative methods provide an alternative approach. Recall that an iterative method starts with an approximate solution and uses it by means of a recurrence formula to provide another approximate solution; by repeated application of the formula, a sequence of

solutions is obtained which (under suitable conditions) converges to the exact solution. Iterative methods have the advantages of simplicity of operation and ease of implementation on computers, and they are relatively insensitive to propagation of errors; they would be used in preference to direct methods for solving linear systems involving several hundred variables, particularly, if many of the coefficients were zero. Systems of over 100 000 variables have been successfully solved on computers by iterative methods, whereas systems of 10 000 or more variables are difficult or impossible to solve by direct methods. 2. The Gauss- Seidel method This text will only present one iterative method for linear equations, due to Gauss and improved by Seidel. We shall use this method to solve the system

. It is suitable for implementation on computers(PSEUDO CODE) The first step is to solve the first equation for x1, the second for x2, and the third for x3 when the system becomes:

. An initial solution is now assumed; we shall use xl = 0, x2 = 0 and x3 = 0. Inserting these values into the right-hand side of Equation (1) yields xl = 1.3. This value for xl is used immediately together with the remainder of the initial solution (i.e., x2 = 0 and x3 = 0) in the right-hand side of

Equation (2) and yields x2 =1.3 - 0.2 x 1.3 - 0 = 1.04. Finally, the values xl = 1.3 and x2 = 1.04 are inserted into Equation (3) to yield x3 = 0.936. This second approximate solution (1.3, 1.04, 0.936) completes the first iteration. Beginning with this second approximation, we repeat the process to obtain a third approximation, etc. Under certain conditions relating to the coefficients of the system, this sequence will converge to the exact solution. We can set up recurrence relations which show clearly how the iterative process proceeds. Denoting the k-th and k+1-th approximations by (x(k)1, x(k)2, x(k)3) and (x(k+1)1, x(k+1)2, x(k+1)3), respectively, we find

. We begin with the starting vector x(0) = (x(0)1, (x(0)2, (x(0)3) all components of which are 0, and then apply these relations repeatedly in the order (1)', (2)' and (3)'. Note that, when we insert values for xl, x2 and x3 into the right-hand sides, we always use the most recent estimates found for each unknown. 3. Convergence The sequence of solutions produced by the iterative process for the above numerical example are shown in the table:

. The student should check that the exact solution for this system is (1,1,1). It is seen that the Gauss- Seidel solutions are rapidly approaching these values; in other words, the method is converging. Naturlly, in practice, the exact solution is unknown. It is customary to end the iterative procedure as soon as the differences between the

x(k+1) and x(k) values are suitably small. 0ne stopping rule is to end the iteration when

. becomes less than a prescribed small number (usually chosen according to the accuracy of the machine on which the calculations are carried out). The question of convergence with a given system of equations is crucial. As in the above example, the Gauss- Seidel method may quickly lead to a solution very close to the exact one; on the other hand, it may converge too slowly to be of practical use, or it may produce a sequence which diverges from the exact solution. The reader is referred to more advanced texts (such as Conte and de Boor ( 1980)) for treatments of this question. In order to improve the chance (and rate) of convergence, the system of equations should be rearranged before applying the iterative

method, so that, as far as possible, each leading-diagonal coefficient is larger (in absolute value) than any other in its row. Checkpoint 1. What is the essential difference between a direct method and an iterative method? 2. Give some advantages of the use of iterative methods compared with that of direct methods. 3. How can you improve the chance of success with the GaussSeidel method? EXERCISES 1. For the example treated above, compute the value of S3, the quantity used in the suggested stopping rule after the third iteration. 2. Use the Gauss- Seidel method to solve the following systems to 5D accuracy (remembering to rearrange the equations if appropriate). Compute the value of Sk (to 6D) after each iteration. a)

x - y + z = -7, 20x + 3y - 2z = 51, 2x + 8y + 4z = 25. Remember to rearrange! Compute the value of Sk to 5 D after each iteration. b)

10x - y -x

+

10

y

- y

= 1

- z

= 1

+ 10z - w

= 1

- z

+

10

w

= 1

STEP 14 SYSTEMS OF LINEAR EQUATIONS 4* Matrix inversion* The general system of linear equations in n variables (STEP11, Section 1) can be written in matrix form Ax = b, when a vector x is sought which satisfies this equation. We will now use the inverse matrix A-1 to find this vector. 1. The inverse matrix In Step 11, we have observed that, if the determinant of A is non-zero, it has an inverse matrix A-1 , and the solution of the linear system can be written as x = A-1 b, i.e., the solution of the system of linear equations can be obtained by first finding the inverse of the coefficient matrix A, and then forming the product A-1 b. However, as a rule, this approach is not adopted in practice. The problem of finding the inverse matrix is itself a numerical task, which generally requires for its solution many more operations (and therefore involves more round- off and other errors) than any of the methods

described in the preceding Steps. However, if the inverse matrix is required for some additional reason, it is sensible to compute the inverse first. For example, the inverse may contain theoretical or statistical information or be of use in some other formula or calculation. 2. Method for inverting a matrix There are many numerical methods for finding the inverse of a matrix. We shall describe one which uses Gauss elimination and backsubstitution procedures. It is simple to apply and is computationally efficient. We shall illustrate the method by application to a 2 x 2 matrix and a 3 x 3 matrix; it should then be clear to the reader, how the method may be extended for use with n x n matrices. As the 2 x 2 example, consider

and seek the inverse matrix

such that

. This is equivalent to solving the two systems

Thus:

a. Form the augmented matrix

b. Apply elementary row operations to the augmented matrix such that A is transformed into an upper triangular matrix

( Step 11,

Section 5):

c. Solve the two systems:

using back- substitution. Note how the systems have been constructed, using

and columns of . From the first system, 3v1 =

-2, 3v1u&127 = -2/3, and 2u1 + v = 1, whence 2u1 = 1 + 2/3, u1 = 5/6. From the second system, 3v2 = 1, v2 = 1/3 and 2u2 + v2 = 0, whence 2u2 = -1/3, u2 = -1/6. Thus the required inverse matrix is:

d. Check: AA-1 should be equal to I. Multiplication yields:

so that A-1 is correct. In this simple example, one can work with fractions, so that no roundoff errors occurred and the resulting inverse matrix is exact. In

general, during hand calculations, the final result should be checked by computing AA-1 , which should be approximately equal to the identity

matrix I. As the 33 example, consider

In order to demonstrate the effects of errors, compute A-1 to 3S. The results of the calculations in tabular form are:

As an example of back- substitution,

taken with the second column

of yields the second column of A-1 . Thus:

yields w2 = -20.0, v2 = 38.3 and u2 = -34.0, determined in this order. Check by multiplication that AA-1 is

,

which is approximately equal to I. The noticeable inaccuracy is due to carrying out the calculation of the elements of A-1 to 3S only. 3. Solution of linear systems using the inverse matrix As previously noted, the unique solution of a linear system Ax = b is x = A-1 b, provided the coefficient matrix A has an inverse A-1 . We shall illustrate this by using the inverse A-1 obtained in Section 2 to compute the solution of the linear system:

The coefficient matrix is

. We can use A-1, calculated in the preceding section, as follows:

. Thus, we arrive at the solution x = -13, y = 16 and z = -2.5 to 2S. We check the solution by adding the three equations:

Inserting this solution in the left-hand side yields to 2S:

Checkpoint

1. In the method for finding the inverse of A, what is the final form of A after the elementary row operations have been carried out? 2. Is the solution of the system Mx = d, x = dM-1 or x = M-1 d (or neither)? 3. Give a condition for a matrix not to have an inverse. EXERCISES 1. Find the inverses of the following matrices, using elimination and back- substitution: 1.

2.

3.

2. Solve the systems of equations (each for two right-hand side vectors): a.

b.

c.

STEP 15 SYSTEMS OF LINEAR EQUATIONS 5* Use of LU decomposition* We have shown in STEP 11 how to solve a linear system Ax = b by Gauss elimination, applied to the augmented matrix

. In the

preceding Step, we have extended the elimination process to calculate the inverse A of the coefficient matrix A and assumed that it exists. Another general approach to solving Ax = b is known as the method of LU decomposition, which provides new insights into matrix algebra and has many theoretical and practical uses. It yields efficient computer algorithms for handling practical problems. The symbols L and U denote lower triangular matrix and upper triangular matrices, respectively. Examples of lower triangular matrices are

Note that in such a matrix all elements above the leading diagonal are zero. Examples of upper triangular matrices are:

where all elements below the leading diagonal are zero. The product of L1 and U1 is

1. Procedure Suppose we have to solve a linear system Ax = b and that we can express the coefficient matrix A in the form of the socalled LU decomposition A = LU. Then we may solve the linear system as follows: Stage l: Write Ax = LUx = b. Stage 2: Set y = Ux, so that Ax = Ly = b. Use forward substitution with Ly = b to find y1, y2, . . , yn in that order, i.e., assume that the augmented matrix for the system Ly = b is:

Then forward substitution yields y1 = b1/l11, and, subsequently,

. Note that the value of yi depends on the values y1, y2, . . , yi-1, which have already been calculated. Stage 3: Finally, use back- substitution with Ux = y to find xn, . . . , x1 in that order.

Later on we shall outline a general method for finding LU decompositions of square matrices. The following example demonstrates this method, involving the matrix A = L1 U1 above. If we wish to solve Ax = b for a number of different vectors b, then this method is more efficient than Gauss elimination. Once we have found an LU decomposition of A, we need only forward and backward substitute to solve the system for any b. Example We shall solve the system

Stage l: An LU decomposition of the system is

Stage 2: Set y = U1 x and then solve the system L1 y = b, i.e.,

Using forward substitution, we obtain:

Stage 3: Solve

Back-substitution yields:

Thus, the solution of Ax = b is:

which you may check, using the original equations. We turn now to the problem of finding an LU decomposition of a given square matrix A. Realizing an LU decomposition For an LU decomposition of a given n x n matrix A, we seek a lower triangular matrix L and an upper triangular matrix U (both of order n x

n) such that A = LU. The matrix U may be taken to be the upper triangular matrix resulting from Gauss elimination without partial pivoting (Sections 3 and 5 of STEP 11), and the matrix L may be taken to be the lower triangular matrix which has diagonal elements 1 and which has as the (i, k) element the multiplier mik. This multiplier is calculated at the k-th stage of Gauss elimination and is required to transform the current value of aik into 0. In the notation of Step 11 , these multipliers were given by mik = aik/akk, I = k+l, k+2,.. ,n.

An example will help to clarify this procedure. From Step 11 , we recall that Gauss elimination, applied to the system

yielded the upper triangular matrix:

Also, we saw that in the first stage we calculated the multipliers rn21 =

a2l/a11 = 1 / 1= 1 and m3l = a3l/a11 = 2/1 = 2, while, in the second stage, we calculated the multiplier m32 = a32/a22 = -3/1 = -3. Thus

It is readily verified that LU equals the coefficient matrix of the original system:

. Another technique which may be used to find an LU decomposition of an n x n matrix is by direct decomposition. In order to illustrate this process, let it be rquired to find an LU decomposition for the 3 x 3 coefficient matrix of the system above. Then the required L and U are of the form

Note that the total number of unknowns in L and U is 12, whereas there are only 9 elements in the 3 x 3 coefficient matrix A. To ensure that L and U are unique, we need to impose 12 - 9 = 3 extra conditions on the

elements of these two triangular matrices. (In the general nn case, n

extra conditions are required.) One common choice is to require all the diagonal elements of L to be 1; the resulting method is known as Doolittle' s method. Another choice is make the diagonal elements in U to be 1; this is Crout' s method. Since Doolittle's method will give the same in this direct LU decomposition for A, given above, we shall use Crout's method to illustrate decomposition procedure. We then require that

Multiplication of L and U yields:

It is clear that this construction by Crout's method yields triangular matrices L and U for which A= LU. Checkpoint 1. What constitutes an LU decomposition of a matrix A? 2. How is a decomposition A = LU used to solve a linear system Ax = b? 3. How may an LU decomposition be obtained by Gauss elimination? EXERCISES

1. Find an LU decomposition of the matrix

where

2. Solve each of the following systems (Exercises 1 and 3 of STEP 11) by first finding an LU decomposition of the coefficient matrix and then using forward and backward substitutions. a.

b.

STEP 16 SYSTEMS OF LINEAR EQUATIONS 6* Tests for ill- conditioning* We recall from Section 4 of STEP 12 that the solutions of ill-conditioned systems of linear equations are very sensitive to small changes in their coefficients and constants. We will show now how one may test a system for ill- conditioning. 1. Norms

One of the most common tests for ill-conditioning of a linear system involves the norm of the coefficient matrix. In order to define this quantity, we must consider first the concept of the norm of a vector or matrix, which in some way assesses the size of their elements. Let x and y be vectors. Then a vector norm ||?|| is a real number with the properties: 1. || x ||  0 and || x || = 0 if and only if x is a vector all components of which are zero; 2. || x || = | | || x || for any real number

;

3. || x + y||  || x || + || y || (triangle inequality). There are many possible ways to choose a vector norm with these properties. One vector norm which is probably familiar to the student is the Euclidean or 2-norm. Thus, if x is an n  1 vector, then the 2-norm is denoted and defined by

As an example, if x is the 5 x 1 vector with the components x1 = 1, x2 = -3, x3 = 4, x4 = -6, x5 = 2, then . Another possible choice of norm, which is more suitable for our purposes here, is the infinity norm, defined by . Thus, we find for the vector in the previous example verified that

has the three properties above. For

. It is easily , the first two

properties are readily verified; the triangle inequality is somewhat more difficult and requires the use of the so-called Cauchy- Schwarz inequality (for example, cf. Cheney and Kincaid ( 1994)). The defining properties of a matrix norm are similar, except that it involves an additional property. Let A and M be matrices. Then a matrix norm || ? || is a real number with the properties: 1.

is a matrix with all elements zero;

2. 3. 4. As for vector norms, there are many ways of choosing matrix norms with the four properties above, but we will consider only the infinity

norm. If A is an nn matrix, then the infinity norm is defined by

By this definition, this norm is the maximum of the sums obtained by adding the absolute values of the elements in each row, whence it is commonly referred to as the maximum row sum norm . As an example, let

. Then

so that A useful property interrelating the matrix and vector infinity norms is

which follows from Property 4 of the matrix norms above. 2. Ill-conditioning tests We will now find out whether or not a system is ill- conditioned by using the condition number of the coefficient matrix. If A is an n x

n matrix and A-1 is its inverse, then the condition number of A is defined by

It is bounded below by 1, since || I || = 1 and , where we have used the matrix norm property 4 given in the preceding section. Large values of the condition number usually indicate illconditioning. As a justification for this last statement, we state and prove the theorem: Theorem: Suppose x satisfies the linear system Ax = b and satisfies the linear system

Proof: We have

. Then

. Since

, we see that . However, since b = Ax, we have

, or .

It then follows that

from which follows the result above. It is seen from the theorem that even if the difference between b and is small, the change in the solution, as measured by the `relative error' may be large when the condition number is large. It follows that a large condition number is an indication of possible illconditioning of a system. A similar theorem for the case when there are small changes to the coefficients of A may be found in more advanced texts such as Atkinson (1993). Such a theorem also shows that a large condition number is an indicator of ill- conditioning. There arises the question: How large has the condition number to be for ill- conditioning to become a problem? Roughly speaking, if the condition number is 1m and the machine in use to solve a linear system has kD accuracy, then the solution of the linear system will be accurate to k - m decimal digits. In Step 12, we had the coefficient matrix

which was associated with an ill- conditioned system. Then

and

This suggests that a numerical solution would not be very accurate, if only 2D accuracy were used in the calculations. Indeed, if the components of A were rounded to two decimal digits, the two rows of A would be identical. Then the determinant of A would be zero, and it follows from the theorem in Step 11 that this system would not have a unique solution. We recall that the definition of the condition number requires A-1, the computation of which is expensive. Moreover, even if the inverse were calculated, this approximation might not be very accurate, if the system is ill-conditioned. It is therefore common in software packages to estimate the condition number by obtaining an estimate of

,

without explicitly finding A-1. Checkpoint 1. What are the three properties of a vector norm? 2. What are the four properties of a matrix norm? 3. How is the condition number of a matrix defined and how is it used as a test for ill- conditioning? EXERCISES

1. For the 5 x 1 vector with elements x1 = 4, x2 = -6, x3 = -5, x4 = 1, and x5 = -1, calculate

.

2. For each of the matrices in Exercise 1 of Step 14, compute a. the infinity norm, b. the condition number. STEP 17 THE EIGEN- VALUE PROBLEM The power method Let A be an nn matrix. If there exists a number vector x such that

, then

and a non-zero

is said to be an eigen- value of A,

and x the corresponding eigen- vector. The evaluation of eigen-values and eigen-vectors of matriccs is a problem that arises in a variety of contexts. Note that if we have an eigen-value then

x, where

and an eigen-vector x,

is any real number, is also an eigen-vector, since .

This shows that an eigen-vector is not unique and may be scaled, if desired (for instance, one might want the sum of the components of the eigenvector to be unity). Writing

as

we conclude from the theorem in STEP 11 that this equation can have a non-zero solution only if the determinant of |A - I| is zero. If we

expand this determinant, then we get an n-th degree polynomial in known as the characteristic polynomial of A. Thus, one way to find the eigen-values of A is to obtain its characteristic polynomial and then to find the n zeros of this polynomial (some of them may be complex!). For example, let

. Then

. This last matrix has the determinant . The zeros of this quadratic yield the eigen-values. Although the characteristic polynomial is easy to work out in this sirnple 2 x 2 example, as n increases, it becomes more complicated (and, of course, it has a correspondingly higher degree). Moreover, even if we can find thc characteristic polynomial, the analytic formulae for the roots of a cubic or quartic are somewhat inconvenient to use, and in any case, we must use numerical methods (cf. Steps 7-10) to find the roots of the polynomials of degree n> 4. It is therefore common to use alternative, direct numerical methods to find eigen-values and eigen-vectors of a matrix. If we are only interested in the eigen- value of largest magnitude, then a popular approach to its evaluation is the power method. We shall discuss later on, how this method may be modified to find the

eigen- value of smallest magnitude. Methods for finding all the eigen-values are beyond the scope of this book. (One such method, called the QR method, is based on QR factorization . 1. The power method Suppose that the n eigen-values of A are

and that they

are ordered in such a way that . Then the method, which is readily implemented on a computer can be used to find

We begin with a starting vector w(0) and

compute the vectors

for j = 1, 2, . . ., so that induction yields , where Aj is A, multiplied by itself j times. Thus w(j) is the product of w(0) and the j-th power of A, which explains why this approach is called the power method. It turns out that at the j-th iteration an approximation to the eigen-vector x associated with

is given by the k-th components

of w(j) and w(j-1), respectively, and an approximation to by

is given

for any

Although there are n possible choices for k, it

is usual to choose it so that

is the component of w(j) with the

largest magnitude. 2. Example We will now use the power method to find the largest eigen-value of the matrix

. As a starting vector we take

. Since the second component of w(1) has the largest magnitude, take k = 2 so that the first approximation to

is

Subsequent iterations of the power method yield

These calculations allow to conclude that the largest eigen-value is about 2.6. 3. Variants In the preceding example, the reader will have noted that the components of w(j) were growing in size as j increases. Overflow problems would arise, if this growth were to continue; hence, in practice, it is usual to use instead the scaled power method. This is identical to the power method except that the vectors w(j) are scaled at each iteration. Thus, suppose w(0) be given and set y(0) = w(0). Then we will carry out for j = 1, 2, . , . the steps: a. Calculate w(j) = Ay(j-1); b. find p such that

so that the p-th

component of w(j) has the largest magnitude; c. evaluate an approximation to

, i.e.,

for some

.

d. calculate

.

In Step c, there are n choices for k. As for the unscaled power method, k is usually chosen to be the value for which

has the

largest magnitude, i.e., k is taken to be the value p obtained in Step b. Another option is to choose k to be the value of p from the previous iteration, although often this results in the same value. The effect of Step d is to produce a vector y(j) with components of magnitude not more than unity. As an example, we apply the scaled power method to the matrix in the preceding section. We take the value of k in each iteration to be p. The starting vector w(0) is the same as before and y(0) = w(0). Then the first four iterations of the scaled power method yield:

Note that the eigen-value estimates are the same as before and the w(j)s are just multiples of those obtained in the preceding section. We shall now discuss the use of the power method to find the eigen-value then

of the smallest magnitude. If A has an inverse,

may be written as

It follows that the smallest eigen-alue of A may be found by finding the largest eigen-value of A-1 and then taking the reciprocal. Thus, if the unscaled power method were used, we would calculate the vectors

In general, it is more efficient to solve the linear system than to find the inverse of A (Step 14). 4. Other aspects It may be shown that the convergence rate of the power method is linear and that, under suitable conditions,

, where C is some positive constant. Thus, the bigger the gap between

the faster the rate of convergence. Since the

power method is an iterative method, one has to stop at some stage. It is usual to carry on the process until successive estimates of the eigen- value agree to a certain tolerance or,

alternatively, until a given maximum number of iterations is exceeded. Difficulties with the power method usually arise when our assumptions about the eigen-values are not valid. For instance, if then the sequence of estimates for

may not converge.

Even if the sequence does converge, one may not be able to get an approximation to the eigen-vector associated with

. A short

discussion of such difficulties may be found in Conte and de Boor (1980). Checkpoint 1. How are the eigen- values and eigen- vectors of a matrix defined? 2. What is the power method for finding the eigen-value of largest magnitude? 3. What advantage does the scaled power method have over the power method? EXERCISES 1. For the 2 x 2 matrix

apply eight iterations of the power method. Find the characteristic polynomial, and hence find the two true eigenvalues of the matrix. Verify that the approximations are converging to the eigen-value with the larger magnitude.

2. Apply five iterations of the normal and scaled power methods to the 3 x 3 matrix

STEP 18 FINITE DIFFERENCES 1 Tables Historically speaking, numerical analysts have always been concerned with tables of numbers, and many techniques have been developed for dealing with mathematical functions, represented in this way. For example, the value of the function at an untabulated point may be required, so that a interpolation is necessary. It is also possible to estimate the derivative or the definite integral of a tabulated function, using some finite processes to approximate the corresponding (infinitesimal) limiting procedures of calculus. In each case, it has been traditional to use finite differences. Another application of finite differences, which is outside the scope of this book, is the numerical solution of partial differential equations. 1. Tables of values Many books contain tables of mathematical functions. One of the most comprehensive is the Handbook of Mathematical Functions, edited by Abramowitz and Stegun (see the Bibliography for publication details), which also contains useful information about numerical methods.

Although most tables use constant argument intervals, some functions do change rapidly in value in particular regions of their argument, and hence may best be tabulated using intervals varying according to the local behaviour of the function. Tables with varying argument intervals are more difficult to work with, however, and it is common to adopt uniform argument intervals wherever possible. As a simple example, consider the 6S table of the exponential function over 0.10 (0.01 ) 0.18 (a notation which specifies the domain 0.10

It is extremely important that the interval between successive values is small enough to display the variation of the tabulated function, because usually the value of the function will be needed at some argument value between values specified (for example, from the above table). If the table is constructed in this manner, we can obtain such intermediate values to a reasonable accuracy by using a polynomial representation (hopefully, of low degree) of the function f. 2. Finite differences Since Newton, finite differences have been used extensively. The construction of a table of finite differences for a tabulated function is simple: One obtains first differences by subtracting each value from the succeeding value in the table, second differences by repeating this operation on the first differences,

and so on for higher order differences. From the above table of

one has the (note the standard layout, with

decimal points and leading zeros omitted from the differences):

(In this case, the differences must be multiplied by 10-5 for comparison with the function values.) 3. Influence of round- off errors Consider the difference table given below for

to

6S, constructed as in Section 2. As before, differences of increasing order decrease rapidly in magnitude, but the third differences are irregular. This is largely a consequence of roundoff errors, as tabulation of the function to 7S and differencing to fourth order illustrates (compare Exercise 3 ).

Although the round-off errors in f should be less than 1/2 in the last significant place, they may accumulate; the greatest error that can be obtained corresponds to:

A rough working criterion for the expected fluctuations (noise level) due to round-off error is shown in the table:

Checkpoint 1. What factors determine the intervals of tabulation of a function? 2. What is the name of the procedure to determine a value of a tabulated function at an intermediate point? 3. What may be the cause of irregularity in the highest order differences in a difference table? EXERCISES

1. Construct the difference table for the function f (x) = x3 for x = 0(1) 6. 2. Construct difference tables for each of the polynomials: a.2x - l for x = 0(1)3. b.3x2 + 2x - 4 for x = 0(1)4. c.2x3 + 3x - 3 for x = 0(1)5. Study your resulting tables carefully; note what happens in the final few columns of each table. Suggest a general result for polynomials of degree n and compare your answer with the theorem in STEP 20. 3. Construct a difference table for the function f (x) = ex, given to 7D for x = 0.1(0.05) 0.5

STEP 19 FINITE DIFFERENCES 2 Forward, backward, central difference notations There are several different notations for the single set of finite differences, described in the preceding Step. We introduce each of these three notations in terms of the so-called shift operator, which we will define first.

1. The shift operator E

Let

be a set of values of the

function f(x) The shift operator E is defined by: . Consequently, . and so on, i.e., , where k is any positive integer. Moreover, the last formula can be extended to negative integers, and indeed to all real values of j and k, so that, for example, , and

. 2. The forward difference operator Q If we define the forward difference operator Q by , then ,

which is the first- order forward difference at xj. Similarly, we find that

is the second- order forward difference at xj, and so on. The forward difference of order k is , where k is any integer.

3. The backward difference operator If we define the backward difference operator

by

, then , which is the first- order backward difference at xj. Similarly,

is the second- order backward difference at xj, etc. The backward difference of order k is , where k is any integer. Note that

4. The central difference operator

.

If we define the central difference operator by , then

, which is the first- order central difference at xj. Similarly,

is the second- order central difference at xj, etc. The central difference of order k is , where k is any integer. Note that

.

5. Differences display The role of the forward, central, and backward differences is displayed by the difference table:

Although forward, central, and backward differences represent precisely the same data: 1. Forward differences are useful near the start of a table, since they only involve tabulated function values below xj ; 2. Central differences are useful away from the ends of a table, where there are available tabulated function values above and below xj; 3. Backward differences are useful near the end of a table, since they only involve tabulated function values above xj.

Checkpoint 4. What is the definition of the shift operator? 5. How are the forward, backward, and central difference operators defined? 6. When are the forward, backward, and central difference notations likely to be of special use? EXERCISES 7. Construct a table of differences for the polynomial ; for x = 0(1)4. Use the table to obtain the values of : ;

1. ;

2. .

3.

8. For the difference table in Section 3 of STEP 18 of f (x) = ex for x = 0.1(0.05)0.5 determine to six significant digits the quantities (taking x0 = 0.1 ): 1.

;

2.

;

3.

;

4.

;

5.

;

9. Prove the statements: 1. 2.

; ;

;

3.

.

4. STEP 20

FINITE DIFFERENCES 3 Polynomials Since polynomial approximations are used in many areas of Numerical Analysis, it is important to investigate the phenomena of differencing polynomials. 1. Finite differences of a polynomial Consider the finite differences of an n-th degree polynomial , tabulated for equidistant points at the tabular interval h. Theorem: The n-th difference of a polynomial of degree n is a constant proportional to n and higher order differences are zero.

Proof: For any positive integer k, the binomial expansion

,

yields

.

Omitting the subscript of x, we find

.

In passing, the student may recall that in the Differential Calculus the increment

is

related to the derivative of f (x) at the point x.

2. Example 3 Construct for f (x) = x with x = 5.0(0.1)5.5 the difference table:

Since in this case n = 3, an =1, h = 0.1, we find

3 for 5.0(0.1)5.5, rounded to

Note that round- off error noise may occur; for example, consider the tabulation of f(x) = x two decimal places:

3. Approximation of a function by a polynomial

Whenever the higher differences of a table become small (allowing for round-off noise), the function represented may be

x with x =

approximated well by a polynomial. For example, reconsider the difference table of 6D for f (x ) = e 0.1(0.05)0.5:

Since the estimate for round-off error at

(cf. the table in STEP 12), we say that third differences are constant

x over the range 0.1 < x < 0.5. An

within round-off error, and deduce that a cubic approximation is appropriate for e

example in which polynomial approximation is inappropriate occurs when f(x) = 10 table:

x for x = 0(1)4, as is shown by the next

Although the function f(x) = 10

x is `smooth', the large tabular interval (h = 1) produces large higher order finite differences.

It should also be understood that there exist functions that cannot usefully be tabulated at all, at least in certain neighbourhoods; for example, f(x) = sin(1/x) near the origin x = 0. Nevertheless, these are fairly exceptional cases.

Finally, we remark that the approximation of a function by a polynomial is fundamental to the widespread use of finite difference methods.

Checkpoint

1. 2. 3.

What may be said about the higher order ( exact) differences of a polynomial? What is the effect of round- off error on the higher order differences of a polynomial? When may a function be approximated by a polynomial?

EXERCISES

1.

Construct a difference table for the polynomial f(x) = x

4 for x = 0(0.1)1 when

a. b. c. d.

the values of f are exact; the values of f have been rounded to 3 D. Compare the fourth difference round-off errors with the estimate +/-6.

e. 2.

Find the degree of the polynomial which fits the data in the table:

Related Documents

Errors
November 2019 66
Errors
November 2019 51
Invenory Errors
June 2020 16
Ora Errors
November 2019 47
Footsteps Errors
November 2019 35
Common Errors
November 2019 29