Gaussian Discriminative Analysis
Gaussian Discriminative Analysis GDA 高斯判别分析Multidimensional Gaussian Model$z \sim N(\vec\mu,\Sigma)$$z \in R^n,\vec\mu \in R^n, \Sigma \in R^{n*n}$$z$ – variable$\vec\mu = \begin{bmatrix} \mu_1\ \mu_2 \ … \ \mu_n \end{bmatrix}$ – mean vector$\Sigma$ – covarience matrixAll the Gaussian models share one covarience matrix.
$E(z) = \vec\mu, Cov(z)=E[(x-\vec\mu)(x-\vec\mu)^T]=E(zz^T)-(E(z))(E(z))^T$
IntroGDA assumes:$x|y=0 \sim N(\mu_0,\Sigma)$$x|y=1 \sim N(\mu_1,\Sigm ...
Principle Component Analysis
Principle Component Analysis 主成分分析
IntroPCA is one of the most important and widely used dimension reduction algorithm. Other dimension reduction algorithm include LDA, LLE and Laplacian Eigenmaps.You can regard PCA as a one layer neural network with nD input and mD output.PCA reject nD data to mD with the maximum various.
PCAWe want to reduce data from nD to mD.$$Y = AXY - R^{m1} \ A- R^{mn} \ X-R^{n*1} \
A = \begin{bmatrix} -a_1-\ -a_2- \ … \ -a_m- \end{bmatrix} Y = \begin{bma ...
Naive Bayes
Naive Bayesian 朴素贝叶斯We would use an example to show this algorithm, this examples is still using in practice now.
Spam email classifier$X$ is 1/0 vector correspond to dictionary and each dimension represents a word. If a word shows up in a email, then its corresponding value equals to 1.We assume that $X_i$ are independently and identically distribution(IID), although it is obvious that they are not for the meaning of email.We choose top 10,000 common used words as the dictionary.$P(x_1…x_{ ...
logistic regression and Newton's method
Logistic Regression 逻辑回归Logistic regression is a classification model, but you can use it to solve regression problems if you want to.WARNING: do not use linear regression to solve claasification problems.
Logistic regressionsigmoid function: $g(x) = \frac{1}{1+e^{-x}}$define $h_{\theta}(x) = g(\theta^TX) = \frac{1}{1+e^{-\theta^TX}},\ P(y=1|x;\theta) = h_{\theta}(x) ,\ P(y=0|x;\theta) = 1-h_{\theta}(x)$combine these two equations : $P(y|x;\theta) = (h_{\ ...
SVM
Support Vector Machine SVM 支持向量机Vapnik(USSR)Suitable for prediction with small sample number
LINK
0. No Free Lunch TheoremIf there is no priori assumption in the feature space, the average performance of all algorithms is consistent。1. SVM introduction 图片来自百度百科
There are infinite lines to separate a linear separable sample space. The line that maximize the margin $d$ and $d_1=d_2=\frac{d}{2}$ is the best. The samples on the dotted lines in the figure is called support vector for they ...
Prime Problem and Dual Problem
Prime Problem and Dual Problemrecommend textbook:Convex Optimization Stepthen BoydNonlinear Programming
Prime Problem$$\left{\begin{array}{l}minimize\ f(\boldsymbol\omega)\s.t.\left{\begin{array}{l}g_i(\boldsymbol\omega)\leq0(i=1\sim K)\h_i(\boldsymbol\omega) = 0 (i=1\sim N)\end{array}\right.\end{array}\right.$$
Dual Problem$$\left{\begin{array}{l}\Theta(\boldsymbol\alpha, \boldsymbol\beta) = \min\limits_{all \ ω}{L(\boldsymbol\omega, \boldsymbol\alpha, \boldsymbol\beta)}\s ...