1.4. Error Analysis

Let \(x\) and \(y\) be two real numbers, and \(\bar x\) and \(\bar y\) be their corresponding machine number representations. We will assume that whenever \(\bar x\) and \(\bar y\) are combined using an arithmetic operation \(\odot\), the result \(\bar x\odot\bar y\) is first computed exactly and then discretized using machine precision. It follows from (?) that there exist real numbers \(0\leq\delta_1,\delta_2,\delta_3\leq\varepsilon\), such that:

(1)\[\begin{split}\bar x \enspace &=& \enspace x(1+\delta_1) \\ \bar y \enspace &=& \enspace y(1+\delta_2) \\ \overline{\bar x\odot\bar y} \enspace &=& \enspace (\bar x\odot\bar y)(1+\delta_3) = (x(1+\delta_1) \odot y(1+\delta_2))(1+\delta_3)\end{split}\]

Example 1

To illustrate this process, let us use a decimal machine operating with five significant digits in its floating-point number representation, and determine the relative errors in adding, subtracting, multiplying, and dividing the two machine numbers

\[x=0.31426\times 10^3\enspace\enspace\enspace\enspace y=0.92577\times 10^5\]

Using a higher precision accumulator (double in length) for the intermediate results gives

\[\begin{split}x+y \enspace &=& \enspace 0.9289126000\times 10^5 \\ x-y \enspace &=& \enspace -0.9226274000\times 10^5 \\ x*y \enspace &=& \enspace 0.2909324802\times 10^8 \\ x/y \enspace &\approx& \enspace 0.3394579647\times 10^{-2}\end{split}\]

The computer with five significant digits stores these in rounded form as

\[\begin{split}x+y \enspace &=& \enspace 0.92891\times 10^5 \\ x-y \enspace &=& \enspace -0.92263\times 10^5 \\ x*y \enspace &=& \enspace 0.29093\times 10^8 \\ x/y \enspace &\approx& \enspace 0.33946\times 10^{-2}\end{split}\]

The relative errors in these results are \(2.8\times 10^{-6}, 2.8\times 10^{-6}, 8.5\times 10^{-6},\) and \(6.0\times 10^{-6}\), respectively – all less than \(10^{-5}\).

While the above example gives comparable results for all four operations of addition, subtraction, multiplication, and division, it does not provide any insight into the size of the relative error. Let us try to theoretically analyze each of these operations.

1.4.1. Addition

Using equation (1), we have

\[\overline{\bar x + \bar y} = (x(1+\delta_1)+y(1+\delta_2))(1+\delta_3)\]

Subsituting \(\delta=\max\{\delta_1,\delta_2,\delta_3\}\) in the above equation gives

\[\begin{split}\overline{\bar x + \bar y} \enspace &\leq& \enspace (x+y)(1+\delta)^2 \\ \Rightarrow \frac{\overline{\bar x + \bar y} - (x+y)}{x+y} \enspace &\leq& \enspace (1+\delta)^2 - 1 = 2\delta + \delta^2\end{split}\]

Since \(|\delta|\leq \varepsilon\), the relative error in addition is on the order of the machine epsilon. Thus, the addition operation is as accurate as it can be with the available precision.

1.4.2. Subtraction

We will show that there is no bound on the relative error. Let \(x=a+\theta\) and \(y=\theta\), so \(x-y=\theta\). Then,

\[\begin{split}\bar x - \bar y \enspace &=& \enspace (a+\theta)(1+\delta_1)-a(1+\delta_2) = \theta + a(\delta_1 - \delta_2) + \theta\delta_1 \\ \Rightarrow \overline{\bar x - \bar y} \enspace &=& \enspace \theta(1+\delta_3) + a(\delta_1 - \delta_2)(1+\delta_3) + \theta\delta_1(1+\delta_3)\end{split}\]

Then the relative error is given by

\[\begin{split}\frac{\overline{\bar x - \bar y} - (x-y)}{x-y} \enspace &=& \enspace \frac{\theta(1+\delta_3) + a(\delta_1 - \delta_2)(1+\delta_3) + \theta\delta_1(1+\delta_3) - \theta}{\theta} \\ &=& \enspace \delta_3 + \delta_1(1+\delta_3) + \frac{a}{\theta}(\delta_1 - \delta_2)(1+\delta_3)\end{split}\]

which becomes unbounded as \(\theta\rightarrow 0\).

Example 2

Suppose we wish to subtract two real numbers \(x\) and \(y\) whose exact value in the decimal representation is as follows:

\[x = 3.212435, \enspace\enspace\enspace\enspace y = 3.21243499999\]

Assume that the floating-point number system on the computer only allows for \(5\) significant digits. So \(x\) and \(y\) will have the following values in machine precision:

\[\bar x = 3.21244, \enspace\enspace\enspace\enspace \bar y = 3.21243\]

The difference \(\bar x - \bar y = 10^{-5}\) can be stored exactly in machine precision. However, the actual difference is \(x-y = 10^{-11}\). Thus, the relative error in subtracting \(y\) from \(x\) is:

\[\frac{\overline{\bar x - \bar y} - (x-y)}{x-y} = \frac{10^{-5} - 10^{-11}}{10^{-11}} \approx 10^6\]

Note that the relative error can grow arbitrarily if we add more digits with value \(9\) in the decimal representation of \(y\).

1.4.3. Multiplication

Using equation (1), we have

\[\overline{\bar x\cdot\bar y} = x\cdot y(1+\delta_1)(1+\delta_2)(1+\delta_3)\]

Subsituting \(\delta=\max\{\delta_1,\delta_2,\delta_3\}\) in the above equation gives

\[\begin{split}\overline{\bar x\cdot\bar y} \enspace &\leq& \enspace x\cdot y (1+\delta)^3 \\ \Rightarrow \frac{\overline{\bar x\cdot\bar y} - x\cdot y}{x\cdot y} \enspace &\leq& \enspace (1+\delta)^3-1 = 3\delta + 3\delta^2 + \delta^3\end{split}\]

Since \(|\delta|\leq \varepsilon\), the relative error in multiplication is on the order of the machine epsilon. Thus, similar to addition, the multiplication operation is as accurate as possible with the available precision.

1.4.4. Division

Using equation (1), we have

(2)\[\overline{\frac{\bar x}{\bar y}} = \frac{x(1+\delta_1)}{y(1+\delta_2)}(1+\delta_3)\]

The denominator in the above equation can be eliminated by using the following geometric series

\[\frac{1}{1-\theta} = 1+\theta+\theta^2+\theta^3+\ldots\]

where \(|\theta|<1\). Substituting the above series in equation (2), gives

\[\overline{\frac{\bar x}{\bar y}} = \frac{x}{y}(1+\delta_1)(1+\delta_3)(1-\delta_2+\delta_2^2-\delta_2^3+\delta_4^2-\ldots)\]

Let \(\delta=\max\{\delta_1,\delta_2,\delta_3\}\). Computing the relative error using the above expression gives a result on the order of \(\delta\), which is comparable to the machine epsilon. Thus, similar to addition and multiplication, division is also as accurate as possible with the available precision.