2.3. Norms

Norms are valuable tools for arguing about the extent and magnitude of error. We introduce some concepts that we will broadly use later on:

Definition

A vector norm is a function from \(\mathbb R^n\) to \(\mathbb R\). If \(x\in\mathbb R^n\), we symbolize its norm by \(\lVert x\rVert\), and its defining properties are:

  1. \(\lVert x\rVert\geq 0\) for all \(x\in\mathbb R^n\), also \(\lVert x\rVert=0\) if and only if \(x=0\).
  2. \(\lVert \alpha x\rVert = \lvert\alpha\rvert\cdot \lVert x\rVert\) for all \(\alpha\in\mathbb R,x\in\mathbb R^n\).
  3. \(\lVert x+y\rVert\leq \lVert x\rVert + \lVert y\rVert\) for all \(x,y\in\mathbb R^n\).

Note that the properties above do not determine a unique form of a “norm” function. In fact, many different valid norms exist. Typically, we will use subscripts (for example, \(\lVert \cdot\rVert_a\)) to denote different types of norms.

2.3.1. Why are vector norms needed?

In many applications, such as the solution of a nonlinear equation \(f(x)=0\), the error \(e=x_\textsf{approx}-x_\textsf{exact}\) is a single number. Thus, the absolute value of \(\vert e\vert\) gives us a good idea of the “extent” of the error.

When solving a system of linear equations \(Ax=b\), the exact solution \(x_\textsf{exact}\) as well as any approximation \(x_\textsf{approx}\) are vectors, and the error \(e=x_\textsf{approx}-x_\textsf{exact}\) is a vector too. It is not as straightforward to assess the “magnitude” of such a vector-valued error. For example, consider \(e_1,e_2\in\mathbb R^{1000}\), and

\[\begin{split}e_1=\left(\begin{array}{c} 0.1 \\ 0.1 \\ 0.1 \\ \vdots \\ 0.1 \end{array}\right), \enspace\enspace\enspace e_2 = \left(\begin{array}{c} 100 \\ 0 \\ 0 \\ \vdots \\ 0 \end{array}\right)\end{split}\]

Which one is worse? While \(e_1\) has a modest amount of error uniformly distributed over all components, all components of \(e_2\) are exact, but one of them has a huge discrepancy! Exactly how we quantify and assess the extent of error is application-dependent. Vectors norms are alternative ways to measure this magnitude, and different norms would be appropriate for different tasks.

2.3.2. Types of Vector Norms

Consider a vector \(v\in\mathbb R^n\). Some norms which satisfy the properties of vector norms are:

  1. The \(L_1\)-norm (or \(1\)-norm):
\[\lVert v\rVert_1 = \sum_{i=1}^n \vert v_i\vert\]
  1. The \(L_2\)-norm (or \(2\)-norm, or Euclidean norm)
\[\lVert v\rVert_2 = \sqrt{\sum_{i=1}^n v_i^2}\]
  1. The infinity norm (or max-norm)
\[\lVert v\rVert_\infty = \enspace \max_{1\leq i\leq n}\vert v_i\vert\]
  1. (Less common) \(L_p\)-norm
\[\lVert v\rVert_p = \left(\sum_{i=1}^n \vert v_i\vert^p\right)^{1/p}\]

NumPy has a package called linalg that allows us to directly compute all these different kinds of norms for a given vector, as shown below. Here, the second parameter specifies the particular norm that we wish to compute (in our case, the \(L_1\) norm, \(L_2\) norm, or \(L_\infty\) norm, respectively):

>>> v = v = np.array([3,1,4,5,2,7])
>>> np.linalg.norm(v,1)
22
>>> np.linalg.norm(v,2)
10.198039027185569
>>> np.linalg.norm(v,np.inf)
7

2.3.3. Equivalence of different norms

Several inequalities can be proved for the \(L_1\), \(L_2\), and \(L_\infty\)-norms, respectively. We start by showing that \(\lVert x\rVert_\infty\leq \lVert x\rVert_1\).

\[\lVert x\rVert_\infty = \max_{i=1}^n \vert x_i\vert \leq \sum_{i=1}^n \vert x_i\vert = \lVert x\rVert_1\]

The inequality above follows from the fact that the sum of the absolute values \(\sum_{i=1}^n \vert x_i\vert\) contains the maximum absolute value \(\max_{i=1}^n \vert x_i\vert\). Furthermore, the absolute value \(\vert x_i\vert\) of each component is less than or equal to the maximum of the absolute values over all components. This allows us to prove an inequality in the reverse direction as well, albeit with the multiplicative factor of \(n\).

\[\sum_{i=1}^n \vert x_i\vert \leq \sum_{i=1}^n \max_{i=1}^n \vert x_i\vert = \sum_{i=1}^n \lVert x\rVert_\infty = n\cdot\lVert x\rVert_\infty\]

The above two inequalities can be compactly written as \(\lVert x\rVert_\infty \leq \lVert x\rVert_1\leq n\cdot\lVert x\rVert_\infty\). The equality holds when all components of \(x\) are equal, as can be easily verified.

Example 1

Consider the vector \(v = (5,2,1,4,3)\). The \(L_1\) and \(L_\infty\)-norms of \(v\) are

\[\lVert v\rVert_1 = 15\enspace\enspace\enspace\enspace \mbox{and} \enspace\enspace\enspace\enspace \lVert v\rVert_\infty = 5\]

respectively. Note that \(\lVert v\rVert_\infty = 5\leq \lVert v\rVert_1 = 15 \leq 5\cdot\lVert v\rVert_\infty = 5\cdot 5 = 25\).

Definition

Two vector norms \(\lVert x\rVert_a\) and \(\lVert x\rVert_b\) are called equivalent if there exist real numbers \(c,d>0\), such that

\[c\lVert x\rVert_a \leq \lVert x\rVert_b \leq d\lVert x\rVert_a\]

It follows from this definition that the \(L_1\) and \(L_\infty\)-norms are equivalent, as proved above. Now consider the square of the \(L_1\)-norm

\[\begin{split}\lVert x\rVert_1^2 = (\vert x_1\vert + \vert x_2\vert + \ldots + \vert x_n\vert)^2 = \sum_{i=1}^n\vert x_i\vert^2 + 2\sum_{i<j} \vert x_i\vert\cdot\vert x_j\vert \geq \sum_{i=1}^n\vert x_i\vert^2 = \lVert x\rVert_2^2\end{split}\]

The above inequality implies that \(\lVert x\rVert_1\geq \lVert x\rVert_2\). For the reverse direction of the inequality, consider the AM-GM inequality:

\[\begin{split}2\vert x_i\vert\cdot\vert x_j\vert \enspace &\leq& \enspace \vert x_i\vert^2 + \vert x_j\vert^2 \\ \Rightarrow 2\sum_{i<j} \vert x_i\vert\cdot\vert x_j\vert \enspace &\leq& \enspace (n-1)\cdot\sum_{i=1}^n \vert x_i\vert^2\end{split}\]

The second inequality follows from the fact that each \(x_i\) appears with exactly \(n-1\) other components \(x_j\). Using the above inequality with the square of the \(L_1\)-norm gives

\[\begin{split}\lVert x\rVert_1^2 = \sum_{i=1}^n\vert x_i\vert^2 + 2\sum_{i<j} \vert x_i\vert\cdot\vert x_j\vert \leq n\sum_{i=1}^n\vert x_i\vert^2 = n\cdot\lVert x\rVert_2^2\end{split}\]

This implies that \(\lVert x\rVert_1\leq\sqrt{n}\lVert x\rVert_2\). Combining the two inequalities gives \(\lVert x\rVert_2\leq\lVert x\rVert_1\leq\sqrt{n}\lVert x\rVert_2\), proving that the \(L_1\) and \(L_2\)-norms are equivalent as well. Again, equality holds when all components of \(x\) are equal.

Example 2

Consider the same vector \(v = (5,2,1,4,3)\). The \(L_1\) and \(L_2\)-norms of \(v\) are

\[\lVert v\rVert_1 = 15 \enspace\enspace\enspace\enspace \mbox{and} \enspace\enspace\enspace\enspace \lVert v\rVert_2 = 7.4162\]

respectively. Note that \(\lVert v\rVert_2 = 7.4162 \leq \lVert v\rVert_1 = 15 \leq \sqrt{5}\cdot 7.4162 = 16.5831\).