純幹貨 | 機器學習中梯度下降法的分類及對比分析（附源碼）

f5e6e0bcd0a1d12d75f80446b773999e5b1750dd

ee0ae16bbac1db584fbdf744e5f517cca85c3b7e

我們使用梯度下降法最小化目標函數J(θ)。在使用梯度下降法時，首先初始化參數值，然後一直改變這些值，直到得到全局最小值。其中，我們計算在每次迭代時計算代價函數的導數，然後使用如下公式同時更新參數值：

a5b40dac8c48a93c6e0b9c0725bf8c81fe10ba8d

線性回歸

cc57b6c6b85dce36eb7cd93d6db9bcfb50b4d7ee

其中， fb4b7ce517e248004c59fb559578e54a4a9482f4 是參數， a65bfc1e47ff7322dfe5ce55cf7f8bb891259a01 是輸入特征。為了求解線性回歸模型，需要找到合適的參數使擬合函數能夠更好地適合模型，然後使用梯度下降最小化代價函數J(θ)。

代價函數：

b2cbb904e3bc0a7044c0a5fa0f8bc2ecdcf622ca

4429e2f2d132d5207f67f09e94b22819c90dc1fa

1e5064d9ee7749bab10f650a8d3b00db8979c28e

下麵的偽代碼能夠解釋其詳細原理：

1. 初始化參數值

2. 迭代更新這些參數使目標函數J(θ)不斷變小。

使用數據量的大小時間複雜度算法的準確率

批量梯度下降法（

隨機梯度下降法

小批量梯度下降法

使用整個數據集（）去計算代價函數的梯度批量梯度下降法會很慢

e73e9a24fa64e4fb81246d312e0a1e6af5742cb9

3. 然後重複上麵每一步；

4. 這意味著需要較長的時間才能收斂；

2cce4e74db8b0834f57cbaab8abe4ab7249305f3

批量梯度下降法不適合大數據集。下麵的Python代碼實現了批量梯度下降法：

1.	import numpy as np  
2.	import random  
3.	def gradient_descent(alpha, x, y, ep=0.0001, max_iter=10000):  
4.	    converged = False  
5.	    iter = 0  
6.	    m = x.shape[0] # number of samples  
7.	  
8.	    # initial theta  
9.	    t0 = np.random.random(x.shape[1])  
10.	    t1 = np.random.random(x.shape[1])  
11.	  
12.	    # total error, J(theta)  
13.	    J = sum([(t0 + t1*x[i] - y[i])**2 for i in range(m)])  
14.	  
15.	    # Iterate Loop  
16.	    while not converged:  
17.	        # for each training sample, compute the gradient (d/d_theta j(theta))  
18.	        grad0 = 1.0/m * sum([(t0 + t1*x[i] - y[i]) for i in range(m)])   
19.	        grad1 = 1.0/m * sum([(t0 + t1*x[i] - y[i])*x[i] for i in range(m)])  
20.	        # update the theta_temp  
21.	        temp0 = t0 - alpha * grad0  
22.	        temp1 = t1 - alpha * grad1  
23.	      
24.	        # update theta  
25.	        t0 = temp0  
26.	        t1 = temp1  
27.	  
28.	        # mean squared error  
29.	        e = sum( [ (t0 + t1*x[i] - y[i])**2 for i in range(m)] )   
30.	  
31.	        if abs(J-e) <= ep:  
32.	            print 'Converged, iterations: ', iter, '!!!'  
33.	            converged = True  
34.	      
35.	        J = e   # update error   
36.	        iter += 1  # update iter  
37.	      
38.	        if iter == max_iter:  
39.	            print 'Max interactions exceeded!'  
40.	            converged = True  
41.	  
42.	    return t0,t1

批量梯度下降法被證明是一個較慢的算法，所以，我們可以選擇隨機梯度下降法達到更快的計算。隨機梯度下降法的第一步是隨機化整個數據集。在每次迭代僅選擇一個訓練樣本去計算代價函數的梯度，然後更新參數。即使是大規模數據集，隨機梯度下降法也會很快收斂。隨機梯度下降法得到結果的準確性可能不會是最好的，但是計算結果的速度很快。在隨機化初始參數之後，使用如下方法計算代價函數的梯度：