.. _sec-matrices:
Matrices
--------
Suppose we have :math:`n` vectors :math:`v^1,v^2,\ldots,v^n\in\mathbb R^m`.
There are several options available to us for denoting this set, either by writing each
one individually as a separate column vector:
.. math::
v^1=\left(
\begin{array}{c}
v^1_1 \\
v^1_2 \\
\vdots \\
v^1_m
\end{array}
\right), v^2=\left(
\begin{array}{c}
v^2_1 \\
v^2_2 \\
\vdots \\
v^2_m
\end{array}
\right),\ldots,v^n=\left(
\begin{array}{c}
v^n_1 \\
v^n_2 \\
\vdots \\
v^n_m
\end{array}
\right)
or by adopting the more efficient notation of combining all of them into a single :math:`m\times n` matrix:
.. math::
\left(
\begin{array}{cccc}
\vert & \vert & & \vert \\
v^1 & v^2 & \ldots & v^n \\
\vert & \vert & & \vert
\end{array}
\right) = \left(
\begin{array}{cccc}
v^1_1 & v^2_1 & \ldots & v^n_1 \\
v^1_2 & v^2_2 & \ldots & v^n_2 \\
\vdots & \vdots & \ddots & \vdots \\
v^1_m & v^2_m & \ldots & v^n_m
\end{array}
\right)
In general, an :math:`m\times n` matrix :math:`A\in\mathbb R^{m\times n}` is written as:
.. math::
A = \left(
\begin{array}{cccc}
A_{11} & A_{12} & \ldots & A_{1n} \\
A_{21} & A_{22} & \ldots & A_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
A_{m1} & A_{m2} & \ldots & A_{mn}
\end{array}
\right)
In the example above, :math:`A_{ij}=v^j_i`. `NumPy `_ has a ``matrix`` class to conveniently define
:math:`m\times n` matrices, as shown below: ::
>>> a=np.matrix([[1,2,3],[4,5,6],[7,8,9]])
>>> a
>>> matrix([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Note that `NumPy `_ defines the individual elements of
the matrix by specifying the rows, *not* the columns. Such an ordering of the
matrix elements is called a *row-major* ordering. On the other hand, if the data was specified
column-by-column, it would be called a *column-major* ordering. You are probably familiar
with the :math:`n\times n` *identity matrix* defined as:
.. math::
I_{n\times n} = \left(\begin{array}{cccc}
\vert & \vert & & \vert \\
e^1 & e^2 & \ldots & e^n \\
\vert & \vert & & \vert
\end{array}
\right) = \left(\begin{array}{ccccc}
1 & 0 & \ldots & 0 & 0 \\
0 & 1 & \ldots & 0 & 0 \\
\vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & \ldots & 1 & 0 \\
0 & 0 & \ldots & 0 & 1
\end{array}
\right)
Matrix-Vector Product
~~~~~~~~~~~~~~~~~~~~~
Let :math:`x\in\mathbb R^{n\times 1}` be an :math:`n`-dimensional column vector
and :math:`A\in\mathbb R^{m\times n}` be an :math:`m\times n` matrix. The
matrix-vector product :math:`b=Ax` is the :math:`m`-dimensional vector defined
as:
.. math::
b = \left(\begin{array}{c}
b_1 \\
b_2 \\
\vdots \\
b_m
\end{array}
\right),\mbox{ where }b_i = \sum_{j=1}^n A_{ij}x_j\mbox{ for all }1\leq i\leq m
Note that *all* matrices are *linear* operators, i.e., if :math:`x,y\in\mathbb R^{n\times 1}`
are two arbitrary :math:`n`-dimensional vectors, and :math:`\alpha\in\mathbb R`
is an arbitrary scalar, then it follows that:
.. math::
A(x+y) &=& \enspace Ax + Ay \\
A(\alpha x) &=& \enspace \alpha Ax
Matrix-vector products can be computed in `NumPy `_ using
the ``dot`` operator, as shown below: ::
>>> a=np.matrix([[1,2,3],[4,5,6],[7,8,9]])
>>> b=np.array([4,5,9])
>>> a.dot(b)
array([ 41, 95, 149])
.. topic:: Example 1: Image Blurring
To illustrate the power of the abstraction that matrix-vector products
provide, we will show an example from image processing where we will
blur an image by casting the problem as a matrix-vector multiplication.
However, first we must outline how to read an image in `Python `_
using the ``matplotlib`` package. Consider reading this `image <_images/chill.jpg>`_: ::
>>> import matplotlib.pyplot as plt
>>> I=plt.imread('chill.jpg')
>>> plt.imshow(I)
>>> plt.show()
The above sequence of commands should display the following image:
.. image:: images/color_plot.png
:height: 612px
:width: 812px
:scale: 75%
:align: center
In our case, it is more convenient to operate on grayscale images. We
will use the `PIL `_ package in `Python `_
for this purpose. ::
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from PIL import Image
>>> fname='chill.jpg'
>>> image = Image.open(fname).convert("L")
>>> arr = np.asarray(image)
>>> plt.imshow(arr, cmap='gray')
>>> plt.show()
The above commands should display the following image:
.. image:: images/grayscale_plot.png
:height: 612px
:width: 812px
:scale: 75%
:align: center
To blur the image, we will perform the following simple operation, which
replaces the grayscale value at every pixel by the weighted average of
its neighbors:
.. math::
\textsf{pixel} \leftarrow \frac{1}{8}(4\cdot\textsf{pixel} + \textsf{pixel}_\textsf{north} + \textsf{pixel}_\textsf{south} + \textsf{pixel}_\textsf{east} + \textsf{pixel}_\textsf{west})
Since our blurring code is substantially more complex than anything else we
have encountered so far, for convenience of execution, we will copy all the
code into a file named ``blur.py``, as shown below: ::
# blur.py
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from scipy.sparse import lil_matrix
# read image file
fname = 'chill.jpg'
image = Image.open(fname).convert("L")
arr = np.asarray(image)
arr.setflags(write = 1)
# initialize blurring matrix
m = arr.shape[0]
n = arr.shape[1]
dofs = m*n
A = lil_matrix((dofs,dofs))
A.setdiag(np.ones(dofs))
for i in range(1,m-1):
for j in range(1,n-1):
A[n*i+j,n*i+j] = 4./8.
A[n*i+j,n*(i-1)+j] = 1./8.
A[n*i+j,n*(i+1)+j] = 1./8.
A[n*i+j,n*i+j-1] = 1./8.
A[n*i+j,n*i+j+1] = 1./8.
A = A.tocsr()
# Blurring function - converts image to a vector, multiplies by
# the blurring matrix, and copies the result back into the image
def blur():
x = np.zeros(shape=(dofs,1))
for i in range(0,m):
for j in range(0,n):
x[n*i+j] = arr[i,j]
y = A.dot(x)
for i in range(0,m):
for j in range(0,n):
arr[i,j] = y[n*i+j]
# Execute the blurring function 20 times
for i in range(0,20):
blur()
# Display the blurred image
plt.imshow(arr,cmap='gray')
plt.show()
Several new concepts have been introduced in the code above. The image array
``arr`` can be thought of as a column vector ``x`` so that it can be
multiplied by a matrix. Here, we see the generality of the concept of a
vector -- an :math:`m\times n` image corresponds to a :math:`mn`-dimensional
vector. We first construct a :math:`mn\times mn` matrix :math:`A` to encode
the blurring action. Since densely allocating this matrix would incur a
substantial memory overhead, we allocate a *sparse* matrix instead using the
``scipy.sparse`` package. A sparse matrix only stores non-zero entries in a
matrix, resulting in significant memory savings. The ``lil_matrix`` format
allows us to set individual elements in the matrix, and the ``tocsr``
function converts it to the *compressed sparse row* format. Several other
formats are available, as explained `here `_.
The ``blur`` function first converts the :math:`m\times n` image array into a
:math:`mn\times 1` vector, multiplies it by the blurring matrix,
and stores the result back into the image. We run this function :math:`20`
times to exaggerate the overall effect. To run the above code, simply
execute the following command: ::
>>> python blur.py
This may take some time depending upon the speed of your computer, because
the size of the matrices and vectors is somewhat large.
Finally, the following blurred result should be displayed upon completion:
.. image:: images/blurred_plot.png
:height: 612px
:width: 812px
:scale: 75%
:align: center
Sparse matrices are extremely useful in practice, because the majority of
matrices encountered in real world applications tend to be sparse. We will
see many more examples of their usage in later sections.
Suppose we consider the individual columns :math:`v^1,v^2,\ldots,v^n` of
:math:`A`, as shown above. Then the matrix-vector product :math:`b` can be
re-written as a *linear combination* of the columns of :math:`A`, i.e., :math:`b=Ax=\sum_{i=1}^n x_iv^i`.
This relation can be schematically depicted as follows:
.. math::
\left(\begin{array}{c}
\vert \\
b \\
\vert
\end{array}\right) = \left(\begin{array}{cccc}
\vert & \vert & & \vert \\
v^1 & v^2 & \ldots & v^n \\
\vert & \vert & & \vert
\end{array}\right)\left(\begin{array}{c}
x_1 \\
x_2 \\
\vdots \\
x_n
\end{array}\right) = x_1\left(\begin{array}{c}
\vert \\
v^1 \\
\vert
\end{array}\right) + x_2\left(\begin{array}{c}
\vert \\
v^2 \\
\vert
\end{array}\right) + \ldots + x_n\left(\begin{array}{c}
\vert \\
v^n \\
\vert
\end{array}\right)
Traditionally, we are used to viewing the relation :math:`Ax=b` as the *action*
of :math:`A` on :math:`x` to produce :math:`b`. The equation above is a
different interpretation that suggests, in contrast, that :math:`x` acts on
:math:`A` to produce :math:`b`. The set of all :math:`m`-dimensional vectors
:math:`b` that can be written as :math:`Ax` for some :math:`n`-dimensional
vector :math:`x` constitute the *column space* of :math:`A`.
.. topic:: Example 2
Consider the following matrix:
.. math::
A = \left[\begin{array}{ccc}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 0
\end{array}\right]
For any :math:`3`-dimensional vector :math:`(u,v,w)`, the following matrix
zeroes out the Z-component, essentially projecting the vector onto the XY-plane.
In this case, the XY-plane corresponds to the column space of :math:`A`.
Further, any vector of the form :math:`(0,0,w)` is projected to the :math:`0`-vector
(or *null* vector). The set of all such vectors comprise the *null space* of
:math:`A`.
As shown in Example 2, the set of all vectors :math:`x` such that :math:`Ax=0`
comprise the *null space* of :math:`A`.
The *rank* of
:math:`A` is the dimension of its column space, i.e., the number of linearly
independent columns of :math:`A`, and is denoted by :math:`\textsf{rank }A`.
Likewise, the dimension of the null space of :math:`A` is noted by
:math:`\textsf{null }A`. In Example 2, :math:`\textsf{rank }A=2` and
:math:`\textsf{null }A=1`, and :math:`\textsf{rank }A + \textsf{null }A = 3`,
which is the number of columns in :math:`A`. In fact, this generalizes to
arbitrary matrices and is often regarded as one of the fundamental theorems in
Algebra:
.. topic:: Rank-Nullity Theorem
If :math:`A` is an :math:`m\times n` matrix, then
.. math::
\textsf{rank }A + \textsf{null }A = n
Note how the dimensions of the result
:math:`b` are dependent on the dimensions of :math:`A` and :math:`x`:
.. math::
b_{m\times 1} = A_{m\times n}\cdot x_{n\times 1}
The "middle" dimension :math:`n` has to be the same for both :math:`A` and
:math:`x` and is consumed as a result of the multiplication. This same principle also applies to the case of multiplying two arbitrary matrices,
i.e., if :math:`A\in\mathbb R^{m\times n}` and :math:`B\in\mathbb R^{n\times
p}` are two matrices, then their product :math:`C=A\cdot B\in\mathbb R^{m\times p}`. The ``dot`` operator
in `NumPy `_ can also be used for multiplying two
matrices, as shown below: ::
>>> a = np.matrix([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
>>> a
matrix([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> b = np.matrix([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
>>> b
matrix([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> a.dot(b)
matrix([[ 70, 80, 90],
[158, 184, 210],
[246, 288, 330]])
Matrix Transpose
~~~~~~~~~~~~~~~~
Another operator on matrices that is quite useful is the *transpose* operator.
The transpose of a matrix :math:`A\in\mathbb R^{m\times n}` is denoted as
:math:`A^T\in\mathbb R^{n\times m}`, and is defined as:
.. math::
A^T = \left(\begin{array}{cccc}
A_{11} & A_{21} & \ldots & A_{m1} \\
A_{12} & A_{22} & \ldots & A_{m2} \\
\vdots & \vdots & \ddots & \vdots \\
A_{1n} & A_{2n} & \ldots & A_{mn}
\end{array}\right)
Thus, :math:`(A^T)_{ij}=A_{ji}`, i.e., the transpose of a matrix is obtained by
*flipping* the original matrix over its diagonal. The ``transpose`` operator in `NumPy `_
can be used for computing the transpose of a given matrix, as shown below: ::
>>> a = np.matrix([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
>>> a
matrix([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a.transpose()
matrix([[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11],
[ 4, 8, 12]])
The number of linearly independent columns in :math:`A` is *equal* to the number
of linearly independent rows in :math:`A`. Thus, it follows that
.. math::
\textsf{rank }A = \textsf{rank }A^T