Matrix factorization is a class of algorithms used for recommendation systems in machine learning. Matrix factorization algorithms work by decomposing dimensionality. Commonly known matrix factorization algorithms are SVD and PCA.
m 1 m 2 m 3 m 4 u 1 w 12 u m u 2 w 21 u m u 3 w 32 u m \begin{array}{c|cc}
& m_1 & m_2 & m_3 & m_4 \\
\hline
u_1 & & w_{12}^{um} & &\\
u_2 & w_{21}^{um} & & &\\
u_3 & & & w_{32}^{um} &
\end{array} u 1 u 2 u 3 m 1 w 2 1 u m m 2 w 1 2 u m m 3 w 3 2 u m m 4
How do we compute matrix factorizations?
Possible approach:
min W , Z ∥ W Z T − X ∥ F r o 2 \min _{W, Z}\left\|W Z^{T}-X\right\|_{\mathrm{Fro}}^{2}
W , Z min ∥ ∥ W Z T − X ∥ ∥ F r o 2
Here ∣ ∣ ⋅ ∣ ∣ F r o ||\cdot||_{Fro} ∣ ∣ ⋅ ∣ ∣ F r o denotes the so-called Frobenius norm, i.e.:
∥ X ∥ F r o : = ∑ i = 1 s ∑ j = 1 n ∣ x i j ∣ 2 \|X\|_{\mathrm{Fro}} :=\sqrt{\sum_{i=1}^{s} \sum_{j=1}^{n}\left|x_{i j}\right|^{2}}
∥ X ∥ F r o : = i = 1 ∑ s j = 1 ∑ n ∣ x i j ∣ 2
Then we can add regularization to the model for W W W and Z Z Z :
min W , Z ∥ W Z T − X ∥ F r o 2 + R 1 ( W ) + R 2 ( Z ) \min _{W, Z}\left\|W Z^{T}-X\right\|_{\mathrm{Fro}}^{2}+R_{1}(W)+R_{2}(Z)
W , Z min ∥ ∥ W Z T − X ∥ ∥ F r o 2 + R 1 ( W ) + R 2 ( Z )
Here we need to choose suitable regularization functions R 1 R_1 R 1 and R 2 R_2 R 2 .
Since W ⋅ Z T W\cdot Z^{T} W ⋅ Z T is equal to a value with more than one solution, thus this optimization problem is not a convex problem. We could use coordinate descent also called alternating minimization to solve this problem.
Coordinate Descent / Alternating Minimization
keep one fixed. 其基本思路是一次优化一个参数(坐标),轮流循环,将复杂优化问题分解为简单的子问题
Example:
W k + 1 = arg min W ∥ W ( Z k ) T − X ∥ 2 W^{k+1}=\arg \min _{W}\left\|W\left(Z^{k}\right)^{T}-X\right\|^{2}
W k + 1 = arg W min ∥ ∥ ∥ W ( Z k ) T − X ∥ ∥ ∥ 2
Z k + 1 = arg min Z ∥ W k + 1 Z T − X ∥ 2 Z^{k+1}=\arg \min _{Z}\left\|W^{k+1} Z^{T}-X\right\|^{2}
Z k + 1 = arg Z min ∥ ∥ W k + 1 Z T − X ∥ ∥ 2
⇒ { W k + 1 = X Z k ( ( Z k ) T Z k ) − 1 Z k + 1 = ( ( ( W k + 1 ) T W k + 1 ) − 1 ( W k + 1 ) T X ) T \Rightarrow \left\{
\begin{array}{lr}
W^{k+1}=X Z^{k}\left(\left(Z^{k}\right)^{T} Z^{k}\right)^{-1} & \\
Z^{k+1}=\left(\left(\left(W^{k+1}\right)^{T} W^{k+1}\right)^{-1}\left(W^{k+1}\right)^{T} X\right)^{T} &
\end{array}
\right. ⇒ ⎩ ⎪ ⎨ ⎪ ⎧ W k + 1 = X Z k ( ( Z k ) T Z k ) − 1 Z k + 1 = ( ( ( W k + 1 ) T W k + 1 ) − 1 ( W k + 1 ) T X ) T
Singular Value Decomposition
We can obtain other kinds of matrix factorizations. A very popular one is the singular value decomposition:
奇异值分解 (Singular Value Decomposition),它能将任意矩阵分解为两个正交阵和一个对角阵,并揭示出矩阵的许多性质。
Every matrix X ∈ R s × n X \in \mathbb{R}^{s \times n} X ∈ R s × n can be written as
X = U Σ V T X=U \Sigma V^{T}
X = U Σ V T
U ∈ R s × s U \in \mathbb{R}^{s \times s} U ∈ R s × s is unitary matrix, i.e. U T U = I U^{T} U=I U T U = I
V ∈ R n × n V \in \mathbb{R}^{n \times n} V ∈ R n × n is unitary matrix, i.e. V T V = I V^{T} V=I V T V = I
Σ = ( σ 1 0 … 0 0 … 0 0 σ 2 … 0 0 … 0 ⋮ ⋮ ⋱ 0 0 … 0 0 0 … σ s 0 … 0 ) ∈ R s × n = diagonal for s less than n \Sigma=
\left(
\begin{array}{lllllll}{
\sigma_{1}} & {0} & {\ldots} & {0} & {0} & {\ldots} & {0} \\
{0} & \sigma_{2} & {\ldots} & {0} & {0} & {\ldots} & {0} \\
{\vdots} & {\vdots} & {\ddots} & {0} & {0} & {\ldots} & {0} \\
{0} & {0} & {\ldots} & \sigma_{s} & {0} & {\ldots} & {0} \\
\end{array}
\right) \in \mathbb{R}^{s \times n}=\text { diagonal } \quad \text{for s less than n} Σ = ⎝ ⎜ ⎜ ⎜ ⎛ σ 1 0 ⋮ 0 0 σ 2 ⋮ 0 … … ⋱ … 0 0 0 σ s 0 0 0 0 … … … … 0 0 0 0 ⎠ ⎟ ⎟ ⎟ ⎞ ∈ R s × n = diagonal for s less than n
σ 1 \sigma_1 σ 1 is the largest, σ s \sigma_s σ s is the smallest
Σ = ( σ 1 0 … 0 0 σ 2 … 0 0 0 ⋱ 0 0 0 … σ n 0 0 … 0 ⋮ ⋮ ⋱ ⋮ 0 0 … 0 ) ∈ R s × n = diagonal for s > n \Sigma=
\left(
\begin{array}{llll}{
\sigma_{1}} & {0} & {\ldots} & {0}
\\ {0} & {\sigma_{2}} & {\ldots} & {0} \\ {0} & {0} & {\ddots} & {0} \\ {0} & {0} & {\ldots} & {\sigma_{n}} \\ {0} & {0} & {\ldots} & {0} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {0} & {0} & {\ldots} & {0}
\end{array}
\right) \in \mathbb{R}^{s \times n}=\text { diagonal } \quad \text{for} s>n Σ = ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎛ σ 1 0 0 0 0 ⋮ 0 0 σ 2 0 0 0 ⋮ 0 … … ⋱ … … ⋱ … 0 0 0 σ n 0 ⋮ 0 ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎞ ∈ R s × n = diagonal for s > n
Now for unitary matrices, we have:
∥ U x ∥ 2 = ⟨ U x , U x ⟩ = ⟨ x , U T U x ⟩ = ⟨ x , x ⟩ = ∥ x ∥ 2 \|U x\|^{2}=\langle U x, U x\rangle=\left\langle x, {U}^{T} U x\right\rangle=\langle x, x\rangle=\|x\|^{2}
∥ U x ∥ 2 = ⟨ U x , U x ⟩ = ⟨ x , U T U x ⟩ = ⟨ x , x ⟩ = ∥ x ∥ 2
i.e., rotations do not affect the Euclidean norm (Frobenius norm)
∥ X V w ∥ 2 = ∥ U Σ V T V w ∥ 2 = ⟨ U Σ w , U Σ w ⟩ = = ⟨ Σ w , Σ w ⟩ = ∑ i = 1 S σ i 2 ∣ w i ∣ 2 \|X V w\|^{2}=\left\|U \Sigma V^{T} V w\right\|^{2}=\langle U \Sigma w, U \Sigma w\rangle==\langle\Sigma w, \Sigma w\rangle=\sum_{i=1}^{S} \sigma_{i}^{2}\left|w_{i}\right|^{2}
∥ X V w ∥ 2 = ∥ ∥ U Σ V T V w ∥ ∥ 2 = ⟨ U Σ w , U Σ w ⟩ = = ⟨ Σ w , Σ w ⟩ = i = 1 ∑ S σ i 2 ∣ w i ∣ 2
Principal Component Analysis
Principal Component Analysis is another matrix factorization algorithm which is similar to SVD.