Gaussian Discriminative Analysis GDA 高斯判别分析

Multidimensional Gaussian Model

$z \sim N(\vec\mu,\Sigma)$
$z \in R^n,\vec\mu \in R^n, \Sigma \in R^{n*n}$
$z$ – variable
$\vec\mu = \begin{bmatrix} \mu_1\ \mu_2 \ … \ \mu_n \end{bmatrix}$ – mean vector
$\Sigma$ – covarience matrix
All the Gaussian models share one covarience matrix.

$E(z) = \vec\mu, Cov(z)=E[(x-\vec\mu)(x-\vec\mu)^T]=E(zz^T)-(E(z))(E(z))^T$

Intro

GDA assumes:
$x|y=0 \sim N(\mu_0,\Sigma)$
$x|y=1 \sim N(\mu_1,\Sigma)$
$y \sim Ber(\phi), \phi = P(y=1)$

GDA model(binary classification)

Multivariate Gaussian distribution:
$P(x) = \frac{1}{(2\pi)^{\frac d2}|\Sigma|^{\frac12}}exp(-\frac12(x-\mu)^T\Sigma^{-1}(x-\mu))$
$|\Sigma|$ is the value of determinant of $\Sigma$


parameter: $\mu_0,\mu_1, \Sigma, \phi$
$P(y) = \phi^y(1-\phi)^{1-y}$
$\phi$ is prior probability, and it depends on the proportion of two classes.


Joint likelihood:
$$
L(\phi, \mu_0, \mu_1, \Sigma) = \sum\limits_{i=1}^mP(x^{(i)},y^{(i)};\phi, \mu_0, \mu_1, \Sigma) = \sum\limits_{i=1}^mP(x^{(i)}|y^{(i)})P(y^{(i)}) \
MLE: \arg\max\limits_{\phi, \mu_0, \mu_1, \Sigma}l(\phi, \mu_0, \mu_1, \Sigma) \
\phi = \frac{\sum\limits_{i=1}^my^{(i)}}{m}=\frac{\sum\limits_{i=1}^m1{y^{(i)}=1}}{m} \
\mu_k = \frac{\sum\limits_{i=1}^m1{y^{(i)}=k}x^{(i)}}{\sum\limits_{i=1}^m1{y^{(i)}=k}},k\in {0,1} \
\Sigma = \frac1m\sum\limits_{i=1}^m(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T
$$

Based on the two Gaussian models, we can draw a boundary line.
在这里插入图片描述

Prediction

$\arg\max\limits_yP(y|x) = \arg\max\limits_y \frac{P(x|y)P(y)}{P(x)}=\arg\max\limits_yP(x|y)P(y)$
($P(x)$ is a constant)

& Logistic Regression

图片是我的笔记
在这里插入图片描述

  • The picture shows when our data is 1D the function looks like Sigmoid function. Actually, it is Sigmoid function and it also applys to higher dimension. I won’t prove it here.
  • GDA is a stricter version of logistic regression because the data has to follow Gaussian distribution.
  • When the data follows Gaussian distribution or the data is very big(according to the central limit theorem), GDA works better than logistic regression.
  • Also, the data follows Gaussian distribution so the model has no local optima.