Seally physicist
Recently, in my work I needed to construct a rational approximation of a matrix-valued
function. The best algorithm I know is AAA1 and that requires
singular value decomposition of \(m \times n\) matrices with \(m \gg n\).
For scalar functions, there is no problem with that since in most cases the
matrix is reasonably sized to use svd call from standard Julia’s LinearAlgebra.
To use AAA algorithm for \(r \times c\) matrix-valued function we have to
deal with matrices \(r c\) times taller which is already painful.
Continuous version of AAA algorithm2 allows to increase number
of rows adaptively, but even with that I had hard time even to store the full
matrix in the memory: typical matrix size was millions times tens.
Luckily, the matrix in AAA algorithm is quite specific and I don’t need to keep all of it in the memory. Also, I don’t need the full SVD, only one right singular vector which corresponds to the smaller singular value. Of course, right singular vectors of a matrix \(A\) coincide with eigenvectors of \(A^\dagger A\). But the latter it extremely ill-conditioned so diagonalizing it is pretty much a dead end.
Since we don’t need left singular vectors, we can mess with the matrix a bit. Let’s make a QR-decomposition: \[ A = Q R, \] where \(Q\) is \(m \times n\) (let’s incorporate \(r\) and \(c\) into \(m\)) matrix with orthonormal columns \(Q^\dagger Q = I\) and \(R\) is an upper triangular \(n \times n\) matrix. The point is that matrices \(A\) and \(R\) have the same singular values and right singular vectors. Matrix \(R\) is small, so one can find it’s SVD straightforwardly. The problem is that while constructing QR-decomposition, we store all Householder/Givens rotations which constitute matrix \(Q\) and that takes memory.
But we don’t really need \(Q\). And the idea to replace matrix \(A\) with
some simpler matrix while preserving singular values and right singular vectors
is really nice. So let’s split matrix A into \(k\) blocks like
\[
A = \begin{bmatrix}
A_1 \\
A_2 \\
\vdots \\
A_k
\end{bmatrix},
\]
where \(A_j\) is \(m_j \times n\) matrix for \(j = 1, \ldots, k\). Each
of the blocks is still “tall” \(m_j > n\) but fits in memory nicely. Let’s
factorize the first block as
\[
A_1 = Q_1 R_1.
\]
We notice that block matrix
\[
\begin{bmatrix}
R_1 \\
A_2 \\
\vdots \\
A_k
\end{bmatrix}
\]
has the same singular values and right vectors. And it’s a bit shorter, since
\(R\) is square matrix and \(m_1 > n\). Let’s go further:
\[
\begin{bmatrix}
R_1 \\
A_2
\end{bmatrix} = Q_2 R_2.
\]
Then the original matrix transforms to
\[
\begin{bmatrix}
R_2 \\
A_3 \\
\vdots \\
A_k
\end{bmatrix}.
\]
Continuing this way, we construct \(R_k\) which coincides the factor \(R\)
of QR-factorization of the full matrix. And this one is easy to deal with.
In principle, we could factorize each of the blocks separately
\(A_j = \tilde Q_j \tilde R_j\) and then make a stack
\[
\begin{bmatrix}
\tilde R_1 \\
\tilde R_2 \\
\vdots \\
\tilde R_k \\
\end{bmatrix}
\]
This would shorten the matrix significantly while preserving its singular
values and right singular vectors. Since the factorizations are independent,
this operation parallelizes nicely. If the resulting stack is short enough,
one can do brute-force SVD. Otherwise nothing stops us from decimating this
stack in the same fashion again and again. This algorithm is known as tall
and skinny QR (TSQR) and can be used in distributed calculations. But for my
purpose it’s an overkill.
I. V. Gosea and S. Güttel, “Algorithms for the Rational Approximation of Matrix-Valued Functions,” SIAM Journal on Scientific Computing, vol. 43, no. 5, pp. A3033–A3054, Jan. 2021 ↩
T. A. Driscoll, Y. Nakatsukasa, and L. N. Trefethen, “AAA Rational Approximation on a Continuum,” SIAM Journal on Scientific Computing, vol. 46, no. 2, pp. A929–A952, Mar. 2024 ↩