2.4. Matrix Norms and Condition Numbers¶
Similar to vectors, norms can also be defined for (square) matrices. These norms allow us to measure the nicety of a linear system in terms of admitting stable numerical algorithms. It turns out these norms allow us to precisly quantify the convergence behavior of many iterative schemes, as we will see in later sections.
2.4.1. Matrix Norms¶
Definition
A matrix norm is a function \(\lVert\cdot\rVert : \mathbb R^{n\times n}\rightarrow \mathbb R\) that satisfies:
- \(\lVert M\rVert\geq 0\) for all \(M\in\mathbb R^{n\times n}\), \(\lVert M\rVert = 0\) if and only if \(M=O\).
- \(\lVert \alpha M\rVert = \vert\alpha\vert\cdot\lVert M\rVert\)
- \(\lVert M+N\rVert\leq \lVert M\rVert + \lVert N\rVert\)
- \(\lVert M\cdot N\rVert \leq \lVert M\rVert\cdot\lVert N\rVert\)
Property \((4)\) above has slightly different flavor than vector norms. Although more types of matrix norms exist, one common category is that of matrix norms induced by vector norms.
Definition
If \(\lVert\cdot\rVert_\star\) is a valid vector norm, its induced matrix norm is defined as:
or equivalently,
Proof: Choose a vector \(y = \alpha x\), where \(\alpha = 1/\lVert x\rVert_\star\). Then,
Also, from the definition of norms, the following relation holds
Thus, the two definitions above are equivalent.
Note again, that not all valid matrix norms are induced by vector norms. One notable example is the very commonly used Frobenius norm:
We can easily show that induced norms satify properties \((1)\) through \((4)\). Properties \((1)-(3)\) are rather trivial, for example,
Property \((4)\) is slightly trickier to show. First, a lemma:
Lemma
If \(\lVert\cdot\rVert\) is a matrix norm induced by a vector norm \(\lVert\cdot\rVert\), then
Proof: Since \(\lVert A\rVert = \max_{x\neq 0}\lVert Ax\rVert/\lVert x\rVert\), we have that for an arbitrary \(y\in\mathbb R^n (y\neq 0)\)
This holds for \(y\neq 0\), but we can see that it is also true for \(y=0\).
Now property \((4)\) can be easily proved using the above lemma:
Note that when writing an expression such as (1), the matrix norm \(\lVert A\rVert\) is understood to be the inferred norm from the vector norm used in \(\lVert Ax\rVert\) and \(\lVert x\rVert\). Thus,
and
are both valid, but we cannot mix and match, for example:
Although the definition of an induced norm allowed us to prove certain properties, it does not necessarily provide a convenient formula for evaluating the matrix norm. Fortunately, such formulas do exist for the \(L_1\) and \(L_\infty\) induced matrix norms. Given here (without proof):
The formula for the \(L_2\) induced matrix norm is more complicated. We will see it when we study eigenvalues.
2.4.2. Condition Numbers¶
When solving a linear system \(Ax=b\), computer algorithms are only providing an approximation \(x_\textsf{approx}\) to the exact solution \(x_\textsf{exact}\). This is due to factors such as finite precision, round-off errors, or even imperfect solution algorithms. In either case, we have an error (in fact, an error vector) defined as:
Naturally, we would like to have an understanding of the magnitude of this error (for example, some appropriate norm \(\lVert e\rVert\)). However, the error cannot be directly measured because the exact solution \(x_\textsf{exact}\) is unknown. One remedy is offered via the residual vector defined as:
The vector \(r\) is something we can compute practically since it involves only the known quantities \(b, A, x_\textsf{approx}\). Moreover, we have:
Equation (2) links the error with the residual. This allows us to write:
This equation provides a bound for the error, as a function of \(\lVert A^{-1}\rVert\) and the norm of the computable vector \(r\). Note that:
- We can obtain this estimate without knowing the exact solution, but
- We need \(\lVert A^{-1}\rVert\) and generally, computing \(\lVert A^{-1}\rVert\) is just as difficult (if not more) than finding \(x_\textsf{exact}\). However, there are some special cases where an estimate of \(\lVert A^{-1}\rVert\) can be obtained.
2.4.3. A Different Source of Error¶
Sometimes, the right hand side \(b\) of the system of equations \(Ax=b\) has errors that make it deviate from its intended value. For example, if each component of \(b\) was defined as \(b_i=f(x_i)\), where \(f\) is an unknown function, then an error in a measuring device that was supposed to sample \(f(x)\) could lead to erroneous readings \(b_i^\star (\neq b_i)\). Thus, measuring inaccuracies can lead to the right hand side vector \(b\) being misrepresented as \(b^\star\). In this case, instead of the intended solution \(x=A^{-1}b\), we compute the solution to a different system \(x^\star=A^{-1}b^\star\). How important is the error \(e=x^\star-x\) that is caused by this misrepresentation of \(b\)? Let \(\delta b=b^\star-b, \delta x=x^\star-x,Ax=b,Ax^\star=b^\star\). Then, we have:
Taking norms,
Thus, the error in the computed solution \(\delta x\) is proportional to the error in \(b\). An even more relevant question is: how does the relative error \(\lVert \delta x\rVert/\lVert x\rVert = \lVert x^\star-x\rVert/\lVert x\rVert\) compare to the relative error in \(b\) (\(\lVert\delta b\rVert/\lVert b\rVert\))? This may be more useful to know, since \(\lVert\delta b\rVert\) may be impossible to compute (if we don’t know the real \(b\)). For this, we write:
Multiplying equation (3) and (4) gives
Thus, the relative error in \(x\) is bounded by a multiple of the relative error in \(b\). This multiplicative constant \(\kappa (A)=\lVert A\rVert\cdot\lVert A^{-1}\rVert\) is called the condition number of \(A\), and is an important measure of the sensitivity of a linear system \(Ax=b\) to being solved on a computer in the presence of inaccurate values.
2.4.4. Why is all this relevant?¶
Simply put, any \(b\) will have some small relative error due to the fact that it is represented on a computer only up to machine precision. The relative error will be at least as much as the machine epsilon due to round-off.
But how bad can the condition number get? Very bad at times. For example, Hilbert matrices \(H_n\in\mathbb R^{n\times n}\) are defined as:
Considering a specific instance for \(n=5\),
Thus, any attempt at solving \(H_5 x = b\) would be subject to a relative error of up to \(10\%\) just due to round-off errors in \(b\)! Another case is that of near-singular matrices, for example:
As \(\varepsilon\rightarrow 0\), the matrix \(A\) becomes singular (non-invertible). In this case, \(\kappa (A)\rightarrow\infty\).
So far, we have considered upper bounds on the condition number. One may also ask what is the best case scenario for \(\kappa (A)\)? Before we answer this question, a lemma:
Lemma
For any vector-induced matrix norm, \(\lVert I\rVert = 1\).
Proof: From the definition,
Using property (4) of matrix norms, we have:
Thus, \(\kappa (A)\geq 1\). The “best” conditioned matrices are of the form
\(A=c\cdot I\), where \(c\in\mathbb R\) is a non-zero constant, and have
\(\kappa (A)=1\). The linalg
package of NumPy
has an in-built function cond
for computing the condition number of any square matrix, as
shown below. The second parameter specifies the particular norm to use
for evaluating the condition number.
>>> a = np.matrix([[1,0,-1],[0,1,0],[1,0,1]])
>>> a
matrix([[ 1, 0, -1],
[ 0, 1, 0],
[ 1, 0, 1]])
>>> np.linalg.cond(a,1)
2.0
>>> np.linalg.cond(a,2)
1.4142135623730951
>>> np.linalg.cond(a,np.inf)
2.0