EM算法和混合模型三:隐线性模型

作者: 引线小白-本文永久链接:httpss://www.limoncc.com/post/387bda4e04291667/
知识共享许可协议: 本博客采用署名-非商业-禁止演绎4.0国际许可证

文本接上篇:EM算法和混合模型二:混合模型案例,来谈谈EM算法具体应用:因子分析。

一、因子分析 FA

有如下数据集 $\displaystyle \mathcal{D}=\{\bm{x}_i\}_{i=1}^N$, 引入隐变量 $\displaystyle \bm{z}$得到新的数据集 $\displaystyle \mathcal{D}^+=\{\bm{x}_i,\bm{z}_i\}_{i=1}^N$,还有隐变量数据集 $\displaystyle \mathcal{D}^{\bm{z}}=\{\bm{z}_i\}_{i=1}^N$。所谓因子模型就是用潜在变量解释观测变量间协方差大部分的模型,其中潜在变量的维度要小于观测变量。可以做如下分解:
$$\begin{align}
\bm{x}_i=\bm{Q}\bm{z}_i+\bm{\mu}+\bm{\varepsilon}_i
\end{align}$$其中: $\displaystyle \bm{x}_i$是 $\displaystyle D\times 1$维的观察变量,且有 $\displaystyle \bm{x}\sim \mathcal{N}\big(\bm{\mu},\bm{C}\big)$、$\displaystyle \bm{z}_i$是 $\displaystyle K\times 1$维的潜在因子、 $\displaystyle \bm{Q}$是 $\displaystyle D\times K$维的因子负荷、 特定因子 $\displaystyle\bm{\varepsilon}_i\sim \mathcal{N}\big(\bm{0},\bm{\varPsi} \big)$、 特定协方差矩阵 $\displaystyle \bm{\varPsi}=\mathrm{diag}\big[\psi_{11}^2,\cdots,\psi_{DD}^2\big]$。

我们使用概率语言,有条件分布 $\displaystyle p \big(\bm{x}\mid \bm{z},\bm{\theta}\big)=\mathcal{N}\big(\bm{Q}\bm{z}+\bm{\mu},\bm{\varPsi}\big)$、隐变量分布 $\displaystyle p \big(\bm{z}\big)=\mathcal{N}\big(\bm{0},\bm{E}\big)$,为何设置这样的隐变量分布,下文将会解释。
$$\begin{align}
p \big(\bm{x}_i\big)=\mathcal{N}\big(\bm{x}_i\mid \bm{\mu},\bm{C}\big)\iff \int p \big(\bm{x}_i\mid \bm{z}_i\big)p \big(\bm{z}\big)d \bm{z}_i=\int\mathcal{N}\big(\bm{x}_i\mid\bm{Q}\bm{z}_i+\bm{\mu},\bm{\varPsi} \big)\mathcal{N}\big(\bm{z}_i\mid \bm{0}, \bm{E}\big)d \bm{z}_i
\end{align}$$现在,我们将注意力集中到协方差上,由高斯线性模型的结论知:
$$\begin{align}
\bm{C}=\bm{Q}\bm{Q}^\text{T}+\bm{\varPsi}
\end{align}$$于是有:
$$\begin{align}
\sigma_{dd}^2=\sum_{k=1}^Kq_{dk}^2+\psi_{dd}^2
\end{align}$$也就说:

方差=共同因子方差+特定因子方差

当然也有隐变量的条件分布:
$$\begin{align}
p \big(\bm{z}\mid \bm{x},\bm{\bm{\theta}}\big)=\mathcal{N}\big(\bm{z}\mid \bm{\mu}_{\bm{z}\mid \bm{x}},\bm{C}_{\bm{z}\mid \bm{x}}\big)
\end{align}$$
其中
$\displaystyle \bm{C}_{\bm{z}\mid \bm{x}}=\big[\bm{C}_{\bm{z}}^{-1}+\bm{Q}^\text{T}\bm{C}_{\bm{x}\mid \bm{z}}^{-1}\bm{Q}\big]^{-1}=\big[\bm{E}+\bm{Q}^\text{T}\bm{\varPsi}^{-1}\bm{Q}\big]^{-1}$

$\displaystyle \bm{\mu}_{\bm{z}\mid \bm{x}}=\bm{C}_{\bm{z}\mid \bm{x}}\bigg[\bm{Q}^\text{T}\bm{C}_{\bm{x}\mid \bm{z}}^{-1}\big[\bm{x}-\bm{\mu}\big]+\bm{C}_{\bm{z}}^{-1}\bm{\mu}_{\bm{z}}\bigg]
=\bm{C}_{\bm{z}\mid \bm{x}}\bigg[\bm{Q}^\text{T}\bm{\varPsi}^{-1}\big[\bm{x}-\bm{\mu}\big]\bigg]
$

特别的 $\displaystyle \bm{\varPsi}=\sigma^2 \bm{E} $叫概率主成分分析 $\displaystyle \textit{(PPCA)} $。这个名字原因稍后会变得很清晰。

二、混合因子分析

接下来我们引入一个分类变量 $\displaystyle y\in\{1,\cdots,C\}$,从而有了混合因子模型,以适应更广范围的数据集: $\displaystyle \mathcal{D}=\mathcal{D}_1\cup \cdots \cup \mathcal{D}_C$,其中 $\displaystyle \mathcal{D}_c=\{\bm{x}_i,\bm{z}_i,y_i\}_{i=1}^{N_c}$。我们分别对每个 $\displaystyle \mathcal{D}_c$使用因子分析。令 $\displaystyle \bm{\theta}=\{\bm{\pi},\bm{\mu},\bm{Q},\bm{\varPsi}\}$
$$\begin{align}
p \big(\bm{x}\mid \bm{z},y=c,\bm{\theta}\big)
&=\mathcal{N}\big(\bm{x}\mid \bm{Q}_c \cdot\bm{z}+\bm{\mu}_c,\bm{\varPsi}\big)\\
p \big(y\mid \bm{\theta}\big)
&=\mathrm{Cat}\big(y\mid \bm{\pi}\big)\\
p \big(\bm{z}\mid \bm{\theta}\big)
&=\mathcal{N}\big(\bm{z}\mid \bm{0},\bm{E}\big)
\end{align}$$


混合因子分析

有$\displaystyle \bm{z}$的条件分布
$$\begin{align}
p \big(\bm{z}_i\mid \bm{x}_i,y=c,\bm{\theta}\big)=\mathcal{N}\big(\bm{\mu}_{\bm{z}},\bm{C}_{\bm{z}}\big)
\end{align}$$其中
$\displaystyle \bm{C}_{\bm{z}\mid \bm{x}_i,y=c}=\big[\bm{E}+\bm{Q}_c^\text{T}\bm{\varPsi}^{-1}\bm{Q}_c\big]^{-1}$

$\displaystyle \bm{\mu}_{\bm{z}\mid \bm{x}_i,y=c}
=\bm{C}_{c}\bigg[\bm{Q}_c^\text{T}\bm{\varPsi}^{-1}\big[\bm{x}_i-\bm{\mu}\big]\bigg]$
我们使用 EM算法来求解参数。下面我们开始着手这件事情,以期求得显然。

$$\begin{align}
p \big(\mathcal{D}^+\mid \bm{\theta}\big)
&=\prod_{i=1}^N p \big(\bm{x}_i,\bm{z}_i,y_i\mid \bm{\theta}\big)
=\prod_{i=1}^Np \big(\bm{x}_i\mid \bm{z}_i,y_i,\bm{\theta}\big)p \big(\bm{z}_i\mid \bm{\theta}\big)p \big(y_i\mid \bm{\theta}\big)\\
&=\prod_{i=1}^N\Bigg[\prod_{c=1}^C\mathcal{N}^{\mathbb{I}(y_i=c)}\big(\bm{x}_i\mid \bm{Q}_c \cdot\bm{z}_i+\bm{\mu}_c,\bm{\varPsi}\big)\cdot\big[\mathcal{N}\big(\bm{z}_i\mid \bm{0},\bm{E}\big)\mathrm{Cat}\big(y_i\mid \bm{\pi}\big)\big]\Bigg]
\end{align}$$
下面我们来简化一下完全数据集的对数似然函数

$$\begin{align}
&\ell \big(\mathcal{D}^+\mid \bm{\theta}\big)=\ln \prod_{i=1}^Np \big(y_i\mid \bm{\theta}\big)+\ln \prod_{i=1}^Np \big(\bm{z}_i\mid \bm{\theta}\big)+\ln \prod_{i=1}^Np \big(\bm{x}_i\mid \bm{z}_i,y_i,\bm{\theta}\big)\\
&=-\frac{ND}{2}\ln 2\pi+\bm{I}^\text{T}\bm{Y}\ln \bm{\pi}- \frac{1}{2}\mathrm{tr}\big(\bm{Z}^\text{T}\bm{Z}\big)+\ln \prod_{i=1}^Np \big(\bm{x}_i\mid \bm{z}_i,y_i,\bm{\theta}\big)\\
&=A+\bm{I}^\text{T}\bm{Y}\ln \bm{\pi}+\ln \prod_{i=1}^Np \big(\bm{x}_i\mid \bm{z}_i,y_i,\bm{\theta}\big)
\end{align}$$

其中 $\displaystyle A=-\frac{ND}{2}\ln 2\pi- \frac{1}{2}\mathrm{tr}\big(\bm{Z}^\text{T}\bm{Z}\big)$。我们单独分析 $\displaystyle \ln \prod_{i=1}^Np \big(\bm{x}_i\mid \bm{z}_i,y_i,\bm{\theta}\big)$,同时令 $\displaystyle \bm{\Lambda}=\bm{\varPsi}^{-1}$
$$\begin{align}
&\ln \prod_{i=1}^Np \big(\bm{x}_i\mid \bm{z}_i,y_i,\bm{\theta}\big)
=\sum_{c=1}^Cy_{ic}\ln\prod_{i=1}^N\mathcal{N}\big(\bm{x}_i\mid \bm{Q}_c \cdot\bm{z}_i+\bm{\mu}_c,\bm{\varPsi}\big)\\
&=\frac{1}{2}\sum_{c=1}^C\bigg[-D \times\bm{I}^\text{T}\bm{y}_c\ln 2\pi+\bm{I}^\text{T}\bm{y}_c\ln \big|\bm{\Lambda}\big|- \mathrm{tr}\big(\bm{y}_c\odot \bm{S}_c\bm{\Lambda}\big)\bigg]\\
&=\frac{1}{2}\sum_{c=1}^C\bigg[B+\bm{I}^\text{T}\bm{y}_c\ln \big|\bm{\Lambda}\big|- \sum_{i=1}^Ny_{ic}\big[\bm{x}_i-\bm{Q}_c \bm{z}_i-\bm{\mu}_c\big]^\text{T}\bm{\Lambda}\big[\bm{x}_i-\bm{Q}_c \bm{z}_i-\bm{\mu}_c\big]\bigg]\\
&=\frac{1}{2}\sum_{c=1}^C\bigg[B+\bm{I}^\text{T}\bm{y}_c\ln \big|\bm{\Lambda}\big|- \mathrm{tr}\big(\bm{y}_c\odot \bm{X}^\text{T}\bm{X}\bm{\Lambda}\big)+2 \mathrm{tr}\big(\bm{y}_c\odot \bm{Z}^\text{T}\bm{X}\bm{\Lambda}\bm{Q}_c\big)- \mathrm{tr}\big(\bm{y}_c\odot \bm{Z}^\text{T}\bm{Z}\bm{Q}_c ^\text{T}\bm{\Lambda}\bm{Q}_c\big)\bigg]
\end{align}$$

其中 $\displaystyle B=-D \times\bm{I}^\text{T}\bm{y}_c\ln 2\pi$、 $\displaystyle \bm{S}_c=\big[\bm{X}-\bm{M}_c\big]^\text{T}\big[\bm{X}-\bm{M}_c\big]$、 $\displaystyle \bm{M}_c=\big[\bm{Q}_c \bm{z}_i+\bm{\mu}_c\big]_{D\times 1}\odot \bm{I}_{N\times 1}$是 $\displaystyle N\times D$维
为简洁记,令 $\displaystyle \bm{z}:=[\bm{z};1]$, $\displaystyle \bm{Q}_c:=\big[\bm{Q}_c,\bm{\mu}_c\big]$于是有:
$$\begin{align}
\ell(\mathcal{D}^+\mid \bm{\theta})
&=A+\bm{I}^\text{T}\bm{Y}\ln \bm{\pi}+\frac{1}{2}\sum_{c=1}^C\bigg[B+\bm{I}^\text{T}\bm{y}_c\ln \big|\bm{\Lambda}\big|- \mathrm{tr}\big(\bm{y}_c\odot \bm{X}^\text{T}\bm{X}\bm{\Lambda}\big)+\\
&2 \mathrm{tr}\big(\bm{y}_c\odot \bm{Z}^\text{T}\bm{X}\bm{\Lambda}\bm{Q}_c\big)- \mathrm{tr}\big(\bm{y}_c\odot \bm{Z}^\text{T}\bm{Z}\bm{Q}_c ^\text{T}\bm{\Lambda}\bm{Q}_c\big)\bigg]
\end{align}$$同时考虑到我们已经更新了 $\displaystyle \bm{z}$和 $\displaystyle \bm{Q}_c$的含义,有
$$\begin{align}
\mathrm{E}\big[\bm{z}_{ic}\big]
=\mathrm{E}\big[\bm{z}_i\mid \bm{x}_i,y=c\big]
&=\begin{bmatrix} \bm{\mu}_{\bm{z}\mid \bm{x}_i,y=c}\\\bm{I} \end{bmatrix}\\
\mathrm{E}\big[\bm{z}_{ic}\bm{z}_{ic}^\text{T}\big]
=\mathrm{E}\big[\bm{z}_i \bm{z}_i ^\text{T}\mid \bm{x}_i,y=c\big]
&=\begin{bmatrix}
\bm{C}_{\bm{z}\mid \bm{x}_i,y=c} & \mathrm{E}\big[\bm{z}_i\mid \bm{x}_i,y=c\big]\
\mathrm{E}^\text{T}\big[\bm{z}_i\mid \bm{x}_i,y=c\big]& \bm{E}
\end{bmatrix}
\end{align}$$
为简洁记,令
$$\begin{align}
\mathrm{E}\big[\bm{Z}_c\big]_{N\times (k+1)}
&=\bigg[\mathrm{E}\big[\bm{z}_1\mid \bm{x}_1,y=c\big],\cdots,\mathrm{E}\big[\bm{z}_N\mid \bm{x}_N,y=c\big]\bigg]^\text{T}\\
\mathrm{E}\big[\bm{Z}_c ^\text{T}\bm{Z}_c\big]_{(k+1)\times (k+1)}
&=\sum_{i=1}^N\mathrm{E}\big[\bm{z}_i \bm{z}_i ^\text{T}\mid \bm{x}_i,y=c\big]
\end{align}$$
现在我们基于隐变量分布
$$\begin{align}
q \big(\mathcal{D}^{\bm{z}},\mathcal{D}^y\mid \mathcal{D}^{\bm{x}},\bm{\theta}\big)
=q \big(\mathcal{D}^y\mid \mathcal{D}^{\bm{x}},\bm{\theta}\big)q \big(\mathcal{D}^{\bm{z}}\mid \mathcal{D}^{\bm{x}},\mathcal{D}^y,\bm{\theta}\big)
\end{align}$$
来求完全数据集的对数似然函数的期望。我们知道:$\displaystyle \mathrm{E}\big[\mathrm{tr}(\bm{A})\big]=\mathrm{tr}\big(\mathrm{E}\big[\bm{A}\big]\big)$。有:
$$\begin{align}
&\mathrm{E}\big[\ell \big(\mathcal{D}^+\mid \bm{\theta}\big)\big]
=\int \ell \big(\mathcal{D}^+\mid \bm{\theta}\big)q \big(\mathcal{D}^{\bm{z}},\mathcal{D}^y\mid \mathcal{D}^{\bm{x}},\bm{\theta}\big)d \mathcal{D}^yd \mathcal{D}^{\bm{z}}\\
&=\mathrm{E}[A]+\bm{I}^\text{T}\mathrm{E}\big[\bm{Y}\big]\ln \bm{\pi}+\frac{1}{2}\sum_{c=1}^C\bigg[\mathrm{E}[B]+\bm{I}^\text{T}\mathrm{E}\big[\bm{y}_c\big]\ln \big|\bm{\Lambda}\big|- \\
&\mathrm{tr}\big(\mathrm{E}\big[\bm{y}_c\big]\odot \bm{X}^\text{T}\bm{X}\bm{\Lambda}\big)+2 \mathrm{tr}\big(\mathrm{E}\big[\bm{y}_c\big]\odot \mathrm{E}^\text{T}\big[\bm{Z}_c\big]\bm{X}\bm{\Lambda}\bm{Q}_c\big)- \mathrm{tr}\big(\mathrm{E}\big[\bm{y}_c\big]\odot \mathrm{E} \big[\bm{Z}_c^\text{T}\bm{Z}_c\big]\bm{Q}_c ^\text{T}\bm{\Lambda}\bm{Q}_c\big)\bigg]
\end{align}$$

对于离散隐变量的混合模型,EM算法的参数 $\displaystyle \bm{\pi}$ 有如下解:
\begin{align}
\bm{\pi}^{EM}=\frac{\mathrm{E}^\text{T}\big[\bm{Y}\big]\bm{I}}{N}=\frac{\bm{N}_c}{N}
\end{align}注意这里的 $\displaystyle \bm{y}$已经是 $\displaystyle \textit{one of coding } $的形式。下面我们将注意集中到 $\displaystyle \bm{Q}_c:=\big[\bm{Q}_c,\bm{\mu}_c\big]$
$$\begin{align}
\mathrm{E}_{\mathcal{D}^+}
\propto\sum_{c=1}^C\bigg[2 \mathrm{tr}\big(\mathrm{E}\big[\bm{y}_c\big]\odot \mathrm{E}^\text{T}\big[\bm{Z}_c\big]\bm{X}\bm{\Lambda}\bm{Q}_c\big)- \mathrm{tr}\big(\mathrm{E}\big[\bm{y}_c\big]\odot \mathrm{E} \big[\bm{Z}_c^\text{T}\bm{Z}_c\big]\bm{Q}_c ^\text{T}\bm{\Lambda}\bm{Q}_c\big)\bigg]
\end{align}$$知 $\displaystyle \frac{\partial \mathrm{tr}\big(\bm{B}\bm{X}^\text{T}\bm{A}\bm{X}\big)}{\partial \bm{X}}=\big[\bm{A}+\bm{A}’\big]\bm{X}\bm{B}^\text{T}$有
$$\begin{align}
&\frac{\partial \mathrm{E}_{\mathcal{D}^+}}{\partial \bm{Q}_c}
=2\bigg[\mathrm{E}\big[\bm{y}_c\big]\odot \mathrm{E}^\text{T}\big[\bm{Z}_c\big]\bm{X}\bm{\Lambda}\bigg]^\text{T}-2 \bm{\Lambda}\bm{Q}_c \bigg[\mathrm{E}\big[\bm{y}_c\big]\odot \mathrm{E} \big[\bm{Z}_c^\text{T}\bm{Z}_c\big]\bigg]^\text{T}=\bm{0}\\
&\iff \bm{\Lambda}\bm{Q}_c \mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot\mathrm{E}\big[\bm{Z}_c^\text{T}\bm{Z}_c\big]= \mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot\bm{\Lambda}\bm{X}^\text{T}\mathrm{E}\big[\bm{Z}_c\big]\\
&\iff \bm{Q}_c^{EM}=\bigg[\mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot \bm{X}^\text{T}\mathrm{E}\big[\bm{Z}_c\big] \bigg]\bigg[ \mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot\mathrm{E}\big[\bm{Z}_c^\text{T}\bm{Z}_c\big]\bigg]^{-1}
\end{align}$$
我们将注意力集中到 $\displaystyle \bm{\Lambda}$,同时有约束条件 $\displaystyle \bm{\varPsi}\odot \bm{E}=\bm{\varPsi}$,有:

$$\begin{align}
\mathrm{E}_{\mathcal{D}^+}
\propto &\sum_{c=1}^C\bigg[\bm{I}^\text{T}\mathrm{E}\big[\bm{y}_c\mid \bm{X}\big]\ln \big|\bm{\Lambda}\big|-\mathrm{tr}\big(\mathrm{E}\big[\bm{y}_c\big]\odot \bm{X}^\text{T}\bm{X}\bm{\Lambda}\big)+\\
&2 \mathrm{tr}\big(\mathrm{E}\big[\bm{y}_c\big]\odot \mathrm{E}^\text{T}\big[\bm{Z}_c\big]\bm{X}\bm{\Lambda}\bm{Q}_c\big)- \mathrm{tr}\big(\mathrm{E}\big[\bm{y}_c\big]\odot \mathrm{E} \big[\bm{Z}_c^\text{T}\bm{Z}_c\big]\bm{Q}_c ^\text{T}\bm{\Lambda}\bm{Q}_c\big)\bigg]-\\
&\lambda \times\mathrm{tr}\big(\bm{\varPsi}\odot \big[\bm{I}-\bm{E}\big]\big)
\end{align}$$
又有 $\displaystyle \frac{\partial \mathrm{tr}\big(\bm{A}\bm{X}\bm{B}\big)}{\partial \bm{X}}=\bm{A} ^\text{T}\bm{B}^\text{T}$,于是有:
$$\begin{align}
&\frac{\partial \mathrm{E}_{\mathcal{D}^+}}{\partial \bm{\Lambda}}
=\bm{I}^\text{T}\mathrm{E}\big[\bm{Y}\big]\bm{I}\cdot\bm{\varPsi}
-\sum_{c=1}^C \mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot \bm{X}^\text{T}\bm{X}+\sum_{c=1}^C \mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot \bm{X}^\text{T}\mathrm{E}\big[\bm{Z}_c\big]\bm{Q}_c ^\text{T}-\lambda\big|\bm{\Lambda}\big|\bm{\varPsi}\odot\big[\bm{I}-\bm{E}\big]=\bm{0}\\
&\frac{\partial \mathrm{E}_{\mathcal{D}^+}}{\partial \bm{\lambda}}
=\mathrm{tr}\big(\bm{\varPsi}\odot \big[\bm{I}-\bm{E}\big]\big)=0
\end{align}$$
写出相关细节:
$$\begin{align}
&\bm{\varPsi}=\frac{1}{N}\sum_{c=1}^C\mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot \bigg[ \bm{X}^\text{T}\bm{X}- \bm{X}^\text{T}\mathrm{E}\big[\bm{Z}_c\big]\bm{Q}_c ^\text{T}\bigg]+\frac{1}{N}\lambda\big|\bm{\Lambda}\big|\bm{\varPsi}\odot\big[\bm{I}-\bm{E}\big]\\
&\iff \bm{\varPsi}=\frac{1}{N}\bm{A}+\frac{1}{N}\lambda\big|\bm{\Lambda}\big|\bm{\varPsi}\odot\big[\bm{I}-\bm{E}\big]\\
&\iff N\bm{\varPsi}-\lambda\big|\bm{\Lambda}\big|\big[\bm{I}-\bm{E}\big]\odot\bm{\varPsi}=\bm{A}\\
&\iff \bigg[N \bm{I}-\lambda\big|\bm{\Lambda}\big|\big[\bm{I}-\bm{E}\big]\bigg]\odot\bm{\varPsi}=\bm{A}\\
&\iff \bm{\varPsi}=\frac{1}{\bigg[N \bm{I}-\lambda\big|\bm{\Lambda}\big|\big[\bm{I}-\bm{E}\big]\bigg]}\bm{A}\\
&\iff \bm{\varPsi}=\bigg[\frac{1}{N} \bm{I}-\frac{1}{\lambda\big|\bm{\Lambda}\big|}\big[\bm{I}-\bm{E}\big]\bigg]\odot \bm{A}
\end{align}$$
代入约束条件有:
$$\begin{align}
&\bm{E}\odot\bigg[\frac{1}{N} \bm{I}-\frac{1}{\lambda\big|\bm{\Lambda}\big|}\big[\bm{I}-\bm{E}\big]\bigg]\odot \bm{A}
=\bigg[\frac{1}{N} \bm{I}-\frac{1}{\lambda\big|\bm{\Lambda}\big|}\big[\bm{I}-\bm{E}\big]\bigg]\odot \bm{A}\\
&\iff \frac{1}{N}\bm{E}\odot\bm{A}=\bigg[\frac{1}{N} \bm{I}-\frac{1}{\lambda\big|\bm{\Lambda}\big|}\big[\bm{I}-\bm{E}\big]\bigg]\odot \bm{A}\\
& \iff \frac{1}{N}\bm{E}=\frac{1}{N} \bm{I}-\frac{1}{\lambda\big|\bm{\Lambda}\big|}\big[\bm{I}-\bm{E}\big]\\
&\iff \lambda=\frac{N}{\big|\bm{\Lambda}\big|}
\end{align}$$
于是有:
$$\begin{align}
\bm{\varPsi}^{EM}=\frac{1}{N}\bm{E}\odot \bm{A}=\frac{1}{N}\sum_{c=1}^C\mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot \bigg[ \bm{X}^\text{T}\bm{X}- \bm{X}^\text{T}\mathrm{E}\big[\bm{Z}_c\big]\bm{Q}_c ^\text{T}\bigg]\odot \bm{E}
\end{align}$$
混合因子模型的参数如下:
$$\begin{align}
\begin{cases}
\hat{\bm{\pi}}&\displaystyle=\frac{\mathrm{E}^\text{T}\big[\bm{Y}\big]\bm{I}}{N}=\frac{\bm{N}_c}{N} \\
\hat{\bm{Q}}_c&\displaystyle=\bigg[\mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot \bm{X}^\text{T}\mathrm{E}\big[\bm{Z}_c\big] \bigg]\bigg[ \mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot\mathrm{E}\big[\bm{Z}_c^\text{T}\bm{Z}_c\big]\bigg]^{-1}\\
\hat{\bm{\varPsi}}&\displaystyle=\frac{1}{N}\sum_{c=1}^C\mathrm{E}^\text{T}\big[\bm{y}_c\big]\odot \bigg[ \bm{X}^\text{T}\bm{X}- \bm{X}^\text{T}\mathrm{E}\big[\bm{Z}_c\big]\hat{\bm{Q}}_c ^\text{T}\bigg]\odot \bm{E}
\end{cases}
\end{align}$$


版权声明
引线小白创作并维护的柠檬CC博客采用署名-非商业-禁止演绎4.0国际许可证。
本文首发于柠檬CC [ https://www.limoncc.com ] , 版权所有、侵权必究。
本文永久链接httpss://www.limoncc.com/post/387bda4e04291667/
如果您需要引用本文,请参考:
引线小白. (Mar. 15, 2017). 《EM算法和混合模型三:隐线性模型》[Blog post]. Retrieved from https://www.limoncc.com/post/387bda4e04291667
@online{limoncc-387bda4e04291667,
title={EM算法和混合模型三:隐线性模型},
author={引线小白},
year={2017},
month={Mar},
date={15},
url={\url{https://www.limoncc.com/post/387bda4e04291667}},
}

'