4.5. Overdetermined Systems¶
So far, we considered linear systems \(Ax=b\) with the same number of equations and unknowns (i.e., \(A\in\mathbb R^{n\times n}\)). In the case where \(A\in\mathbb R^{m\times n}\), with \(m>n\) (more equations than unknowns) the existence of a true solution is not guaranteed, in this case we look for the “best possible” substitute for a solution. Before analyzing what that means, let’s look at how such problems arise.
As an example, in an experiment, we measure the pressure of a gas in a closed container as a function of the temperature. From Physics, we know that
What are \(\alpha\) and \(\beta\)? Experimentally, the measurements should ideally lie on a straight line \(y=c_1x+c_0\), but do not, due to measurement error. If we have \(n\) measurement pairs \((x_1,y_1),\ldots,(x_n,y_n)\) we would have wanted:
Here, \(A_{n\times2}x_{2\times1}=b_{n\times1}\) is a rectangular system. We cannot hope to find a true solution to this system. Instead, lets try to find an “approximate” solution, such that \(Ax\approx b\). Lets look at the residual of this “interpolation”. The residual of the approximation of each data point is:
If we write the vector of all residuals:
Although we can’t find an \(x\) such that \(Ax=b\) (thus, \(r=0\)), we can at least try to make \(r\) small. As another example, consider the problem of finding the best parabola \(f(x)=c_2x^2+c_1x+c_0\) that fits measurements \((x_1,y_1),\ldots,(x_n,y_n)\). We would like
Once again, we would like to make \(r=b-Ax\) as small as possible.
How do we quantify \(r\) being small? \(\Rightarrow\) using a norm! We could ask that \(\lVert r\rVert_1, \lVert r\rVert_2\) or \(\lVert r\rVert_\infty\) be as small as possible. Any of these norms would be intuitive to consider for minimization (especially \(1\)- and \(\infty\)-norms are very intuitive). However, we typically use the \(2\)-norm for this purpose, because its the easiest to work with in this problem.
Definition
The least squares solution of the overdetermined system \(Ax\approx b\) is the vector \(x\) that minimizes \(\lVert r\rVert_2=\lVert b-Ax\rVert_2\).
Define \(Q(x)=Q(x_1,x_2,\ldots,x_n)=\lVert b-Ax\rVert_2^2\) where \(x=(x_1,\ldots,x_n)\) and \(A\in\mathbb R^{m\times n}, b\in\mathbb R^m\) (\(m>n\)). The least squares solution is the set of values \(x_1,\ldots,x_n\) that minimize \(Q(x_1,x_2,\ldots,x_n)\).
If \(x_1\ldots,x_n\) are those that minimize \(Q\), then:
in order to guarantee a minimum.
Thus,
Since \(r=b-Ax\), we have:
The system above is called the normal equations system; it is a square system that has as solution the least-squares approximation of \(Ax\approx b\).
The normal equations always have a solution (with the simple condition that the columns of \(A\) have to be linearly independent, which is usually true).
4.5.1. \(QR\) factorization¶
While the normal equations can adequately compute the least squares solution, the condition number of \(A^TA\) is the square of that of \(A\) (if \(A\) was a square matrix). An alternative method that does not suffer from this problematic conditioning is \(QR\) factorization.
Definition
An \(n\times n\) matrix \(Q\) is called orthonormal if and only if
Theorem
Let \(A\in\mathbb R^{m\times n}\) (\(m>n\)) have linearly independent columns. Then a decomposition \(A=QR\) exists, such that \(Q\in\mathbb R^{m\times m}\) is orthonormal and \(R\in\mathbb R^{m\times n}\) is upper triangular, i.e.,
where \(\hat R\) is an \(n\times n\) upper triangular matrix. Additionally, given that \(A\) has linearly independent columns, all diagonal elements \(r_{ii}\neq 0\).
Now, let us write
where \(\hat Q\in\mathbb R^{m\times n}\) contains the first \(n\) columns of \(Q\) and \(Q^\star\in\mathbb R^{m\times (m-n)}\) contains the last \((m-n)\) columns. Respectively, we write:
where \(\hat R\in\mathbb R^{n\times n}\) (and upper triangular) contains the first \(n\) rows of \(R\). \(\hat R\) is also non-singular because it has linearly independent columns. We can verify the following:
The factorization \(A=\hat Q\hat R\) is the so-called economy size \(QR\) factorization. Once we have \(\hat Q\) and \(\hat R\) computed, we observe that the normal equations can be written as:
The last equality follows because \(\hat R\) is invertible. The benefit of using the \(QR\) factorization is that \(\textsf{cond}(A^TA)=[\textsf{cond}(\hat R)]^2\). Thus, the resulting equation is much better conditioned than the normal equations.