Haron's Blog

Created2020-04-16|machine learning•Course Notes

Gaussian Discriminative Analysis GDA 高斯判别分析Multidimensional Gaussian Model$z \sim N(\vec\mu,\Sigma)$$z \in R^n,\vec\mu \in R^n, \Sigma \in R^{n*n}$$z$ – variable$\vec\mu = \begin{bmatrix} \mu_1\ \mu_2 \ … \ \mu_n \end{bmatrix}$ – mean vector$\Sigma$ – covarience matrixAll the Gaussian models share one covarience matrix. $E(z) = \vec\mu, Cov(z)=E[(x-\vec\mu)(x-\vec\mu)^T]=E(zz^T)-(E(z))(E(z))^T$ IntroGDA assumes:$x|y=0 \sim N(\mu_0,\Sigma)$$x|y=1 \sim N(\mu_1,\Sigm ...

Principle Component Analysis

Created2020-04-14|machine learning•Course Notes

Principle Component Analysis 主成分分析 IntroPCA is one of the most important and widely used dimension reduction algorithm. Other dimension reduction algorithm include LDA, LLE and Laplacian Eigenmaps.You can regard PCA as a one layer neural network with nD input and mD output.PCA reject nD data to mD with the maximum various. PCAWe want to reduce data from nD to mD.$$Y = AXY - R^{m1} \ A- R^{mn} \ X-R^{n*1} \ A = \begin{bmatrix} -a_1-\ -a_2- \ … \ -a_m- \end{bmatrix} Y = \begin{bma ...

Naive Bayes

Created2020-04-14|machine learning•Course Notes

Naive Bayesian 朴素贝叶斯We would use an example to show this algorithm, this examples is still using in practice now. Spam email classifier$X$ is 1/0 vector correspond to dictionary and each dimension represents a word. If a word shows up in a email, then its corresponding value equals to 1.We assume that $X_i$ are independently and identically distribution(IID), although it is obvious that they are not for the meaning of email.We choose top 10,000 common used words as the dictionary.$P(x_1…x_{ ...

logistic regression and Newton's method

Created2020-04-14|machine learning•Course Notes•math

Logistic Regression 逻辑回归Logistic regression is a classification model, but you can use it to solve regression problems if you want to.WARNING: do not use linear regression to solve claasification problems. Logistic regressionsigmoid function: $g(x) = \frac{1}{1+e^{-x}}$define $h_{\theta}(x) = g(\theta^TX) = \frac{1}{1+e^{-\theta^TX}},\ P(y=1|x;\theta) = h_{\theta}(x) ,\ P(y=0|x;\theta) = 1-h_{\theta}(x)$combine these two equations : $P(y|x;\theta) = (h_{\ ...

SVM

Created2020-03-20|machine learning•Course Notes

Support Vector Machine SVM 支持向量机Vapnik(USSR)Suitable for prediction with small sample number LINK 0. No Free Lunch TheoremIf there is no priori assumption in the feature space, the average performance of all algorithms is consistent。1. SVM introduction 图片来自百度百科 There are infinite lines to separate a linear separable sample space. The line that maximize the margin $d$ and $d_1=d_2=\frac{d}{2}$ is the best. The samples on the dotted lines in the figure is called support vector for they ...

Prime Problem and Dual Problem

Created2020-03-19|machine learning•Course Notes

Prime Problem and Dual Problemrecommend textbook:Convex Optimization Stepthen BoydNonlinear Programming Prime Problem$$\left{\begin{array}{l}minimize\ f(\boldsymbol\omega)\s.t.\left{\begin{array}{l}g_i(\boldsymbol\omega)\leq0(i=1\sim K)\h_i(\boldsymbol\omega) = 0 (i=1\sim N)\end{array}\right.\end{array}\right.$$ Dual Problem$$\left{\begin{array}{l}\Theta(\boldsymbol\alpha, \boldsymbol\beta) = \min\limits_{all \ ω}{L(\boldsymbol\omega, \boldsymbol\alpha, \boldsymbol\beta)}\s ...