Question

1 Approved Answer

Posted on Sep 03, 2024

Accuracy in Matlab and Condition Number For most systems of linear equations, we can get an answer with enough accuracy fast enough in a number

Accuracy in Matlab and Condition Number

For most systems of linear equations, we can get an answer with enough accuracy fast enough in a number of different ways, and we don't even bother with trying to measure the accuracy of the solution we get in Matlab. But in certain special circumstances either the linear system we are trying to solve leads to unusually large round-off errors or the we require answers with a very high degree of accuracy. When this occurs, it is helpful to understand the ways that we can measure error, and the options we have in Matlab to solve problems quickly without sacrificing too much accuracy. The source of the problem is the way computers store numbers and do arithmetic, which is known as floating point arithmetic. Floating point arithmetic stores magnitude and the most significant digits of a number and ignores the rest. At some level this is necessary, since many numbers have infinitely many digits in decimal form (either because the digits repeat like in or , or because the number is irrational like or ). Note: the exact number of digits stored actually varies a bit, because the computer stores the digits in binary (of course). Matlab uses 53 bits per floating point number, so it stores almost (but not quite) 16 decimal digits, as . This method is efficient and accurate in most cases, but it can cause problems when we try to do arithmetic that combines very large numbers with very small ones. format short b = pi * 10^20 s = exp(1)* 10^-8

% The "==" tests whether two numbers are equal. The code below returns 1, % indicating Matlab finds that these numbers are equal. bool = b+s==b % Even though Matlab recognizes that s is not zero. bool = s == 0 Errors like the one about are typically referred to as round-off errors. The variable s isn't rounded off normally, but when it's combined with a large number, the result has more signifcant digits than it can store, so matlab retains the largest sixteen or so, which are all digits from b, and all the digits from s are lost. Round off errors can be a big problem when using Matlab to solve linear systems. Example Consider the following problem, solved by applying row operations to transform the augmented matrix to RREF. u = 2^-16; A = [u, 1; 1, 1]; b = [1; 2]; % form the augmented matrix R = [A b] % use the (1,1) entry as the pivot for the first row, and make the (1,2) % entry zero R = [R(1,:); R(2,:)-R(2,1)/R(1,1)*R(1,:)] % now that the matrix is in echelon form, make the second column a pivot % column by zeroing out the (1,2) entry R = [R(1,:)-R(1,2)/R(2,2)*R(2,:);R(2,:)] % scale the rows of the matrix so that the pivots are equal to one R = [1/R(1,1)*R(1,:); 1/R(2,2)*R(2,:)] Since the RREF matrix does not have a pivot in the last column, the system is consistent and since it has a pivot in the other two columns, the solution is unique, and that unique solution is the the third column of the matrix. Exercse 1: Treat as an unknown and apply the formula for the inverse of a matrix from Section 2.2 to compute in terms of . Then find the correct solution using . Write you answer for and for x on the cover sheet for this assignment and enter your solution below and compare it to the solution found using row reduction (which is column 3 of the matrix above) using "==." On the cover sheet, state whether any of the entries of the solution found by Matlab are equal to the actual solution. % enter the value of x here, treating u as a variable

% compare x to the solution found by rref

Partial Pivoting

Round off errors are a risk when doing row reduction; when we automote the RREF algorithm, it becoes impossible to know if the arithmetic being done is combining large and small numbers as it performs row operations. One solution to this problem is to do the row reduction algorithm in a way that minimizes the risk of round off error. This approach is called partial pivoting. Partial pivoting always uses the largest (in absolute value) availble entry in a column as the pivot. For the first column this is always the entry in the first column with the largest absolute value. After swapping rows to put the pivot in the position of the augmented matrix, this entry is used to zero out all the other entries in the first column. In the above example, 1 is (much) larger than u, so even though u is non-zero and could be scaled to be a pivot, the partial pivoting method swaps the rows and uses the 1 as the pivot in the first column. For the remaining columns, the pivot is selected as the enry with the largest absolute value that isn't in a row that already has a pivot. So the pivot for then position is selected by choosing the entry in column two with the largest absolute value in rows through , the pivot for the position is selected from rows through , etc. Note that it's not possible to know these values until the algorithm has selected a pivot for the previous row and eliminated that variable from the rows below.

Exercise 2 Solve the system by swapping the rows of the augmented matrix first, and then applying row operations to reduce to RREF. Compare the answer obtained this way to the answer you found using in the previous problem. On the cover sheet, state whether any of the entries of the solution found by Matlab are equal to the actual solution. u=2^-16; A = [u, 1; 1, 1]; b = [1; 2]; % put your code here

% compare x from Exercise 1 to the solution you found by row operations with partial pivoting

Measuring error Not all rounding errors are a problem, of course. Physicists and engineers have used relatively short decimal approximations for for millenia without issue. In order to know when rounding errors lead to problems we need to be able to distinguish between large and small errors. For numbers this is pretty easy; we just use to measuer the difference between the numbers and . Since our errors are vectors, we will want to do something similar, but first we need to define (mathematically) how to measure the "size" or magnitude of a vector. There are a couple of ways of doing this:

You may recognize the second one; it's the definition of the length of a vector used in 2- and 3-dimensions, based on the Pythagorean theorem. It works perfectly well in any number of dimensions, but because the arithmetic is simpler, the other two definitions are a bit easier to implement and work with. In this project we will use the first definition (which is called the 1-norm or the -norm). Absolute error Using the first definition above, we can measure the absolute error by taking the magnitude of the difference between the vector and it's approximation: . You can do this with the matlab comand "norm(, 1)" since we are using the 1-norm to measure magnitude. Be sure to include the "1" to tell matlab to use the 1-norm. Exercise 3a Measure the error between the solution of the system found using row reduction but no partial pivoting and the solution you found using in Exercise 1. You should copy and paste code used in Exercise 1 and the example immediately above it as needed. Store your answer in the variable aError3. u = 2^-16; A = [u, 1; 1, 1]; b = [1; 2]; % put code here to solve the system using row operations (but not partial % pivoting)

% enter the value of x from Exercise 1 here, using u as a variable % x = [ ; ];

% measure the absolute error between x and the solutions found with row % reduction (but no partial pivoting)

Relative error The absolue size of the error is not usually enough to know whether our calculations are accurate enough. If the answer is large, a larger absolute error is usually acceptable, and if the answer is small, even a small absolute error can lead to problems. So we usually measure errors relative to the size of the approximation with the formula . Exercise 3b Find the relative error of the error in Exercise 3a. Store your answer in the variable rError3. % put code here to find the relative error

% store your answer in the variable rError3 % rError3 = ; Exercise 4 This problem uses a (large) special matrix called a Hilbert matrix, and an "all ones" vector. A = hilb(16); v = ones(16,1); We will manufacture a system of linear equations with coefficient matrix that has solution . b = A*v; R = [A b];

Exercise 4a Use the rref() command to find a solution to this system. Note: the RREF of the matrix will have (many) non-pivot columns. The simplest solution to find will be the one where each of the resulting free variables are set to zero. Store your solution in the variable x. % put your code here

% store your answer in the variable x % x = [ ] Exercise 4b Find the absolute and relative error between the calculated solution x and the actual solution v. Store your answers in variables named aError4 and rError4. Write your relative error as a percentage on the cover sheet for this assignment.

% store your answer in the variables aError and rError % aError4 = % rError4 =

Condition number

For the most part, the problems which result in large relative error depend on the coefficient matrix and not on the choice of the vector b. To help identify which matrices are likely to lead to inaccurate results we use what is known as the condition number of a matrix. The basic idea is that a matrix with a large condition number is likely to result in large relative errors when used in Matlab or other software. In this assignment, we only consider the condition number of square matrices; we may revisit the question of the condition number of a non-square matrix in a future assignment.

The condition number of a square matrix is based on the largest (relative) possible difference in the size of the vector compared to the vector . The difference in size is defined to be the norm of the matrix itself:

It's a little tricky to find the vector x that determines the norm of a matrix, but we can say that for any vector v, the norm is an upper bound. In other words, .

Now, consider a linear system in matrix form: . Taking the magnitude of the vector on both sides, and using this upper bound we note that . Solving for yields

When we solve the system on Matlab, we get an approximate solution r due to rounding errors, with an absolute error of . That means that when we multiply the answer won't be exactly equal to . We will write = . Assuming again that is invertible, we can write this as . Taking the magnitude of both sides and applying the upper bound again we obtain an upper limit on the error in our solution that is the result of measurement error of b:

Note that the upper limit of the absolute error of our solution is now affected by two factors: and the norm of the matrix $A^{-1}$. As noted above, it's more important to study the relative error, and to do so we need to multiple these last two inequalitites:

The product is the condition number of the matrix . When this number is large, then the relative error of our solution can be unacceptable even when and b are nearly equal (which makes small). Since the floating point arithmetic in Matlab is accurate to about 16 significant digits, we can more or less assume that is about . So if the condition number of the matrix is is less than, say , then we can expect a relative error of no more than %. Whether or not that is sufficient accuracy depends on the context of the problem, of course. It's probably fine for building a model airplane, but maybe not for the design of a medical instrument used in robotic surguery. Example The function cond() can be use to show that the condition number of the matrix in Exercise 4 is about . A = hilb(16); condA = cond(A) This means that we can expect a relative error in the solution to of around = 49 = 4900%. Compare this to the actualrelative error you found in Exercise 4b. The type of matrix in Exercie 4 is known as a Hilbert matrix, and the Matlab command hilb(n) will generate a hilbert matrix with rows and columns. When the entries are shown as fractions, the pattern is pretty clear. format rat; A = hilb(3) A = hilb(5) format short;

Exercise 5a Use Matlab to solve the linear system . Store your answer in the variable r. % put your code here

% store your answer in the variable r % r = Exercise 5b Now compute . Store the value of in the variable bDelta. % put your code here

% store your answer in the variable bDelta % bDelta = Exercse 5c Find the condition number of the coefficient matrix for this system, and estimate the accuracy of the answer found in Exercise 5a. Store the condition number in the variable condA, and write the estimated relative accuracy of r as a percentage on the cover sheet for this assignment (you can put your supporting work here or on the cover sheet). Now use the relative error to give an estimate on the absolute error of each entry of the solution found in Exercise 5a, and write this error on the cover sheet for this assignment. % store your answer in the variable condA % condA =

% put any code here used to calculate the estimate of the relative % accuracy Exercise 6 Find the smallest Hilbert matrix that has a condition number greater than or equal to by trial and error. Write the size of the matrix on the cover sheet for this assignment. % put your code here