LR的原理,损失函数,求解方法_背诵版
LR的原理
逻辑回归本质上是线性回归,只是在特征到结果的映射中加入了一层逻辑函数,g(z)=11+e−z{\rm{g}}(z) = \frac{1}{{1 + {e^{ - z}}}}g(z)=1+e−z1,即:先把特征线性求和z=w0+w1∗x1+...,+wn∗xn{\rm{z}} = {w_0} + {w_1}*{x_1} + ..., + {w_n}*{x_n}z=w0+w1∗x1+...,+wn∗xn,然后使用函数g(z)作为假设函数来预测。
逻辑回归用来分类0/1问题,也就是预测结果属于0或者属于1的二值分类问题,有模型:
p(y=1∣x)=g(wTx)=11+e−wTxp(y = 1|x) = g({w^T}x) = \frac{1}{{1 + {e^{ - {w^T}x}}}} p(y=1∣x)=g(wTx)=1+e−wTx1
p(y=0∣x)=1−g(wTx)=e−wTx1+e−wTxp(y = 0|x) = 1 - g({w^T}x) = \frac{{{e^{ - {w^T}x}}}}{{1 + {e^{ - {w^T}x}}}} p(y=0∣x)=1−g(wTx)=1+e−wTxe−wTx
损失函数
对于训练数据集,特征数据x={x1,x2,...,xm}x = \{ {x_1},{x_2},...,{x_m}\}x={x1,x2,...,xm}和对应的分类标签y={y1,...,ym}{\rm{y}} = \{ {y_1},...,{y_m}\}y={y1,...,ym}。假设m个样本相互独立,那么它们的联合分布为各边缘分布的乘积,得到似然函数:
L(w)=∏i=1mg(wTx)yi∗(1−g(wTx))1−yiL(w) = {\prod\limits_{i = 1}^m {g({w^T}x)} ^{{y_i}}}*{(1 - g({w^T}x))^{1 - {y_i}}} L(w)=i=1∏mg(wTx)yi∗(1−g(wTx))1−yi
取对数:
e(w)=lnL(w)=∑i=1myi∗lng(wTx)+(1−y)ln(1−g(wTx))e(w) = \ln L(w) = \sum\limits_{i = 1}^m {{y_i}*\ln g({w^T}x)} + (1 - y)\ln (1 - g({w^T}x)) e(w)=lnL(w)=i=1∑myi∗lng(wTx)+(1−y)ln(1−g(wTx))
求解方法
与线性回归类似,我们使用梯度上升的方法(类似与梯度下降方法),那么随机梯度上升更新规则为:w:=w+α∗∇we(w)w: = w + \alpha *{\nabla _w}e(w)w:=w+α∗∇we(w)
∂∂wje(w)=∂∂wj{∑i=1m{yilng(wTxi)+(1−yi)ln(1−g(wTxi))}=∂∂wj∑i=1m[yig(wTxi)−1−yi1−g(wTxi)]g(wTxi)′=∂∂wj∑i=1m[yi−g(wTxi)](wTxi)′=∑i=1m[yi−g(wTxi)]wj\begin{array}{l} \frac{\partial }{{\partial {w_j}}}e(w) = \frac{\partial }{{\partial {w_j}}}\{ \sum\limits_{i = 1}^m {\{ {y_i}ln\;g({w^T}{x_i})} + (1 - {y_i})ln\;(1 - g({w^T}{x_i}))\} \\ \;\;\;\;\;\;\;\;\;\;\;\;\; = \frac{\partial }{{\partial {w_j}}}\sum\limits_{i = 1}^m {[\frac{{{y_i}}}{{g({w^T}{x_i})}} - \frac{{1 - {y_i}}}{{1 - g({w^T}{x_i})}}]g({w^T}{x_i})'} \\ \;\;\;\;\;\;\;\;\;\;\;\;\; = \frac{\partial }{{\partial {w_j}}}\sum\limits_{i = 1}^m {[{y_i} - g({w^T}{x_i})]({w^T}{x_i})'} \\ \;\;\;\;\;\;\;\;\;\;\;\;\; = \sum\limits_{i = 1}^m {[{y_i} - g({w^T}{x_i})]{w_j}} \end{array} ∂wj∂e(w)=∂wj∂{i=1∑m{yilng(wTxi)+(1−yi)ln(1−g(wTxi))}=∂wj∂i=1∑m[g(wTxi)yi−1−g(wTxi)1−yi]g(wTxi)′=∂wj∂i=1∑m[yi−g(wTxi)](wTxi)′=i=1∑m[yi−g(wTxi)]wj