0%

Low Rank and Compressed Sensing

1. Sherman-Morrison-Woodbury Formula

  • \(\displaystyle (I-\mathbf{u}\mathbf{v}^\top)^{-1}=I+\frac{\mathbf{u}\mathbf{v}^\top}{1-\mathbf{v}^\top\mathbf{u}}\).
Proof. We have \[\begin{aligned} (I-\mathbf{u}\mathbf{v}^\top)(I-\mathbf{u}\mathbf{v}^\top)^{-1}&=(I-\mathbf{u}\mathbf{v}^\top)\left(I+\frac{\mathbf{u}\mathbf{v}^\top}{1-\mathbf{v}^\top\mathbf{u}}\right) \\&=I-\mathbf{u}\mathbf{v}^\top+\frac{(I-\mathbf{u}\mathbf{v}^\top)\mathbf{u}\mathbf{v}^\top}{1-\mathbf{v}^\top\mathbf{u}} \\&=I-\mathbf{u}\mathbf{v}^\top+\frac{\mathbf{u}\mathbf{v}^\top-\mathbf{u}(\mathbf{v}^\top\mathbf{u})\mathbf{v}^\top}{1-\mathbf{v}^\top\mathbf{u}} \\&=I-\mathbf{u}\mathbf{v}^\top+\frac{(1-\mathbf{v}^\top\mathbf{u})\mathbf{u}\mathbf{v}^\top}{1-\mathbf{v}^\top\mathbf{u}} \\&=I \end{aligned}\] and \[\begin{aligned} (I-\mathbf{u}\mathbf{v}^\top)^{-1}(I-\mathbf{u}\mathbf{v}^\top)&=\left(I+\frac{\mathbf{u}\mathbf{v}^\top}{1-\mathbf{v}^\top\mathbf{u}}\right)(I-\mathbf{u}\mathbf{v}^\top) \\&=I-\mathbf{u}\mathbf{v}^\top+\frac{\mathbf{u}\mathbf{v}^\top(I-\mathbf{u}\mathbf{v}^\top)}{1-\mathbf{v}^\top\mathbf{u}} \\&=I-\mathbf{u}\mathbf{v}^\top+\frac{\mathbf{u}\mathbf{v}^\top-\mathbf{u}(\mathbf{v}^\top\mathbf{u})\mathbf{v}^\top}{1-\mathbf{v}^\top\mathbf{u}} \\&=I. \end{aligned}\]

\(\square\)

  • \((I_n-UV^\top)^{-1}=I_n+U(I_k-V^\top U)^{-1}V^\top\), where \(U\) and \(V\) are \(n \times k\) matrices.
Proof. We have \[\begin{aligned} (I_n-UV^\top)(I_n-UV^\top)^{-1}&=I_n-UV^\top+(I_n-UV^\top)U(I_k-V^\top U)^{-1}V^\top \\&=I_n-UV^\top+(U-UV^\top U)(I_k-V^\top U)^{-1}V^\top \\&=I_n-UV^\top+U(I_k-V^\top U)(I_k-V^\top U)^{-1}V^\top \\&=I_n-UV^\top+UV^\top \\&=I_n \end{aligned}\] and \[\begin{aligned} (I_n-UV^\top)^{-1}(I_n-UV^\top)&=I_n+U(I_k-V^\top U)^{-1}V^\top-(I_n+U(I_k-V^\top U)^{-1}V^\top)UV^\top \\&=I_n+U(I_k-V^\top U)^{-1}V^\top-UV^\top-U(I_k-V^\top U)^{-1}V^\top UV^\top \\&=I_n-UV^\top+U(I_k-V^\top U)^{-1}V^\top(I_n-UV^\top) \\&=I_n-UV^\top+U(I_k-V^\top U)^{-1}(V^\top-V^\top UV^\top) \\&=I_n-UV^\top+U(I_k-V^\top U)^{-1}(I_k-V^\top U)V^\top \\&=I_n-UV^\top+UV^\top \\&=I_n. \end{aligned}\]

\(\square\)

  • \((A-UV^\top)^{-1}=A^{-1}+A^{-1}U(I-V^\top A^{-1}U)^{-1}V^\top A^{-1}\).

1.1. Application: Solving a Perturbation Problem

  • Suppose \(A\mathbf{w}=\mathbf{b}\) is solved for \(\mathbf{w}\), and we want to solve \((A-\mathbf{u}\mathbf{v}^\top)\mathbf{x}=\mathbf{b}\). We first solve \(A\mathbf{z}=\mathbf{u}\), then \[\mathbf{x}=\mathbf{w}+\frac{\mathbf{v}^\top\mathbf{w}\mathbf{z}}{1-\mathbf{v}^\top\mathbf{z}}.\]

1.2. Application: Recursive Least Squares

  • Suppose there is a new measurement in least squares: \(\begin{bmatrix} A \\ \mathbf{v} \end{bmatrix}\mathbf{x}=\begin{bmatrix} \mathbf{b} \\ b_\text{new} \end{bmatrix}\). The new normal equation is \[\begin{bmatrix} A^\top & \mathbf{v}^\top \end{bmatrix}\begin{bmatrix} A \\ \mathbf{v} \end{bmatrix}\widehat{\mathbf{x}}_\text{new}=\begin{bmatrix} A^\top & \mathbf{v}^\top \end{bmatrix}\begin{bmatrix} \mathbf{b} \\ b_\text{new} \end{bmatrix},\] i.e., \((A^\top A+\mathbf{v}^\top\mathbf{v})\widehat{\mathbf{x}}_\text{new}=A^\top\mathbf{b}+\mathbf{v}^\top b_\text{new}\). The update formula is \[(A^\top A+\mathbf{v}^\top\mathbf{v})^{-1}=(A^\top A)^{-1}-c(A^\top A)^{-1}\mathbf{v}^\top\mathbf{v}(A^\top A)^{-1}\] with \(\displaystyle c=\frac{1}{1+\mathbf{v}(A^\top A)^{-1}\mathbf{v}^\top}\). To find \(c\), we only need to solve the old equation \((A^\top A)\mathbf{y}=\mathbf{v}^\top\). The same idea applies when \(A\) has more new rows.

1.3. Application: Kalman Filter

  • The update ideal also applies to dynamic least squares.

2. Derivative of \(A^{-1}\)

  • Suppose \(A\) and \(B\) are invertible. \(B^{-1}-A^{-1}=B^{-1}(A-B)A^{-1}\).
Proof. Since \(A\) and \(B\) are invertible, then \(B^{-1}(A-B)A^{-1}=B^{-1}AA^{-1}-B^{-1}BA^{-1}=B^{-1}-A^{-1}\).

\(\square\)

  • Let \(\Delta A=B-A\), then \(\displaystyle \frac{\Delta A^{-1}}{\Delta t}=(A+\Delta A)^{-1}\left(-\frac{\Delta A}{\Delta t}\right)A^{-1}\). Therefore, as \(\Delta t \to 0\), \[\frac{\text{d}A^{-1}}{\text{d}t}=-A^{-1}\frac{\text{d}A}{\text{d}t}A^{-1}.\]

3. Change in Eigenvalues and Singular Values

  • \(A(t)\mathbf{x}(t)=\lambda(t)\mathbf{x}(t)\), \(\mathbf{y}^\top(t)A(t)=\lambda(t)\mathbf{y}^\top(t)\), and \(\mathbf{y}^\top(t)\mathbf{x}(t)=1\). In matrix notation, \(AX=X\Lambda\), \(Y^\top A=\Lambda Y^\top\), and \(Y^\top X=I\).

  • \(\lambda(t)=\mathbf{y}^\top(t)A(t)\mathbf{x}(t)\).

Proof. \(\mathbf{y}^\top(t)A(t)\mathbf{x}(t)=\lambda(t)\mathbf{y}^\top(t)\mathbf{x}(t)=\lambda(t)\).

\(\square\)

  • \(\displaystyle \frac{\text{d}\lambda}{\text{d}t}=\mathbf{y}^\top(t)\frac{\text{d}A}{\text{d}t}\mathbf{x}(t)\).
Proof. Since \(\lambda(t)=\mathbf{y}^\top(t)A(t)\mathbf{x}(t)\), then \[\begin{aligned} \frac{\text{d}\lambda}{\text{d}t}&=\frac{\text{d}\mathbf{y}^\top}{\text{d}t}A(t)\mathbf{x}(t)+\mathbf{y}^\top(t)\frac{\text{d}A}{\text{d}t}\mathbf{x}(t)+\mathbf{y}^\top(t)A(x)\frac{\text{d}\mathbf{x}}{\text{d}t} \\&=\mathbf{y}^\top(t)\frac{\text{d}A}{\text{d}t}\mathbf{x}(t)+\lambda(t)\left(\frac{\text{d}\mathbf{y}^\top}{\text{d}t}\mathbf{x}(t)+\mathbf{y}^\top(t)\frac{\text{d}\mathbf{x}}{\text{d}t}\right) \\&=\mathbf{y}^\top(t)\frac{\text{d}A}{\text{d}t}\mathbf{x}(t)+\lambda(t)\frac{\text{d}(\mathbf{y}^\top\mathbf{x})}{\text{d}t} \\&=\mathbf{y}^\top(t)\frac{\text{d}A}{\text{d}t}\mathbf{x}(t)+\lambda(t)\frac{\text{d}(1)}{\text{d}t} \\&=\mathbf{y}^\top(t)\frac{\text{d}A}{\text{d}t}\mathbf{x}(t). \end{aligned}\]

\(\square\)

  • \(\displaystyle \frac{\text{d}\sigma}{\text{d}t}=\mathbf{u}^\top(t)\frac{\text{d}A}{\text{d}t}\mathbf{v}(t)\).

4. Interlacing

  • Suppose the eigenvalues of a symmetric matrix \(S\) is \(\mu_1 \geq \cdots \geq \mu_n\), and the eigenvalues of \(S+\mathbf{u}\mathbf{u}^\top\) is \(\lambda_1 \geq \cdots \geq \lambda_n\), then \(\lambda_1 \geq \mu_1 \geq \lambda_2 \geq \mu_2 \geq \cdots \geq \lambda_n \geq \mu_n\).