AdaBoost Algorithm 自适应提升

Intro

Adaboost is the abbreviation of adaptive boosting.
Adaboost combines plenty of weak classification models to form a strong classification.
Adaboost usually use one layer decision tree as weak claasification.

Adaboost (for binary classification)

data $T = {(x_i, y_i)}_{i=1 \sim n}$
weight of data $w_i$
weight of classifiers $z_i$
weighted error rate $e_m$

$$
Initialization \ sampling \
\ \ \ \ \ \ \ \ D_1 = (w_1^{(1)}, w_2^{(1)}…w_n^{(1)}) \
for(m=1 \sim M) \
{ \
\ \ \ \ \ \ \ \ train \ the \ classifier: \
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ G_m(x) = \pm 1 \ \
\ \ \ \ \ \ \ \ compute \ weighted \ error \ rate: \
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ e_m = P(G_m(x_i) \not= y_i) = \sum\limits_{i=1}^N w^{(m)}i \
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha_m = \frac12 \log \frac{1-e_m}{e_m} \
\ \ \ \ \ \ \ \ update \ weight \ distribution(aka \ resampling) \
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ z_m = \sum\limits^N{i=1}w_i^{(m)}\exp[-\alpha_my_iG_m(x_i)] \
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ w_i^{(m+1)} = \frac{w_i^{(m)}}{z_m}\exp[-\alpha_my_iG_m(x_i)] \
} \
final \ classifier \
\ \ \ \ \ \ \ \ f(x) = \sum\limits_{m=1}^M\alpha_mG_m(x) \
\ \ \ \ \ \ \ \ G(x) = sign(f(x))
$$

Data scientists have proved that adaboost can reduce the error rate on training set and at least make the error rate on validation set grow not too fast.