272 阿裏雲技術社區[雲棲]

Keras在ImageNet中的應用：VGGNet, ResNet, Inception以及Xeception

更多深度文章，請關注：https://yq.aliyun.com/cloud

幾個月前，我寫了一篇關於如何使用CNN（卷積神經網絡）尤其是VGG16來分類圖像的教程，該模型能夠以很高的精確度識別我們日常生活中的1000種不同種類的物品。

那時，模型還是和Keras包分開的，我們得從free-standing GitHub repo上下載並手動安裝；現在模型已經整合進Keras包，原先的教程也已經不再適用，所以我決定寫一篇新的教程。

在教程中，你將學習到如何編寫一個Python腳本來分類你自己的圖像。

博客結構

1. 簡要說明一下這幾個網絡架構；

2. 使用Python編寫代碼：載入訓練好的模型並對輸入圖像分類；

3. 審查一些樣本圖片的分類結果。

Keras中最新的深度學習圖像分類器

Keras提供了五種開箱即用型的CNN：

1. VGG16

2. VGG19

3. ResNet50

4. Inception V3

5. Xception

什麼是ImageNet

ImageNet曾是一個計算機視覺研究項目：（人工）打標簽並分類成22000個不同物品種類。然而，當我們在討論深度學習和CNN的時候，“ImageNet”意味著ImageNet Large Scale Visual Recognition Challenge，簡寫為ILSVRC。

ILSVRC的目的是訓練一個能夠正確識別圖像並分類（1000種）的模型：模型使用約120萬張圖像用作訓練，5萬張圖像用作驗證，10萬張圖像用作測試。

這1000種分類涵蓋了我們的日常生活接觸到的東西，具體列表請點擊。

在圖像分類上，ImageNet競賽已經是計算機視覺分類算法事實上的評價標準——而自2012年以來，排行榜就被CNN和其它深度學習技術所統治。

過去幾年中ImageNet競賽裏表現優異的模型在Keras中均有收錄。通過遷移學習，這些模型在ImageNet外的數據集中也有著不錯的表現。

VGG16和VGG19

圖1: VGG網絡架構（source）

VGG網絡架構於2014年出現在Simonyan和Zisserman中的論文中,《Very Deep Convolutional Networks for Large Scale Image Recognition》。

該架構僅僅使用堆放在彼此頂部、深度不斷增加的3×3卷積層，並通過max pooling來減小volume規格；然後是兩個4096節點的全連接層，最後是一個softmax分類器。“16”和“19”代表網絡中權重層的數量（表2中的D和E列）:

在2014年的時候，16還有19層網絡還是相當深的，Simonyan和Zisserman發現訓練VGG16和VGG19很有難度，於是選擇先訓練小一些的版本（列A和列C）。這些小的網絡收斂後被用來作為初始條件訓練更大更深的網絡——這個過程被稱為預訓練（pre-training）。

預訓練很有意義，但是消耗大量時間、枯燥無味，在整個網絡都被訓練完成前無法進行下一步工作。

如今大部分情況下，我們已經不再使用預訓練，轉而采用Xaiver/Glorot初始化或者MSRA初始化（有時也被稱作He et al.初始化，詳見《Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification》）。如果你感興趣，可以從這篇文章中理解到weight initialization的重要性以及深度神經網絡的收斂——《All you need is a good init, Mishkin and Matas (2015)》。

VGGNet有兩個不足：

1. 訓練很慢；

2. weights很大。

由於深度以及全連接節點數量的原因，VGG16的weights超過533MB，VGG19超過574MB，這使得部署VGG很令人討厭。雖然在許多深度學習圖像分類問題中我們仍使用VGG架構，但是小規模的網絡架構更受歡迎（比如SqueezeNet, GoogleNet 等等）。

ResNet

與AlexNet、OverFeat還有VGG這些傳統順序型網絡架構不同，ResNet的網絡結構依賴於微架構模組（micro-architecture modules，也被稱為network-in-network architectures）。

微架構模組指構成網絡架構的“積木”，一係列的微架構積木（連同你的標準CONV，POOL等）共同構成了大的架構（即最終的網絡）。

ResNet於2015年出現在He et al的論文《Deep Residual Learning for Image Recognition》中，它的出現很有開創性意義，證明極深的網絡也可以通過標準SGD（以及一個合理的初始化函數）來訓練：

圖3: He et al.於2015年提出的殘差模組

在2016年的著作《Identity Mappings in Deep Residual Networks》中，他們證實了可以通過更新殘差模組（residual module）來使用標誌映射（identify mappings），達到提高精度的目的。

圖4: （左）原始殘差模組（右）使用預激活（pre-activation）更新的殘差模組

盡管ResNet比VGG16還有VGG19要深，weights卻要小（102MB），因為使用了全局平均池化（global average pooling），而不是全連接層。

Inception V3

“Inception”微架構於2014年出現在Szegedy的論文中，《Going Deeper with Convolutions》。

圖5: GoogleNet中使用的Inception模組原型

Inception模組的目的是扮演一個“多級特征提取器”，在網絡相同的模組內計算1×1、3×3還有5×5的卷積——這些過濾器的輸出在輸入至網絡下一層之前先被堆棧到channel dimension。

該架構的原型被稱為GoogleNet，後繼者被簡單的命名為Inception vN，N代表Google推出的數字。

Keras中的Inception V3架構來自於Szegedy et al.的後續論文，《Rethinking the Inception Architecture for Computer Vision(2015)》，該論文打算通過更新inception模組來提高ImageNet分類的準確度。

Inception V3比VGG還有ResNet都要小，約96MB。

Xception

圖6: Xception架構

Xception是被François Chollet提出的, 後者是Keras庫的作者和主要維護者。

Xception是Inception架構的擴展，用depthwise獨立卷積代替Inception標準卷積。

關於Xception的出版物《Deep Learning with Depthwise Separable Convolutions》可以在這裏找到。

Xception最小僅有91MB。

SqueezeNet

Figure 7: SqueezeNet中的“fire”模組，由一個“squeeze”和一個“expand”模組組成。(Iandola et al., 2016)

僅僅4.9MB的SqueezeNet架構能達到AlexNet級別的精確度(~57% rank-1 and ~80% rank-5)，這都歸功於“fire”模組的使用。然而SqueezeNet的訓練很麻煩，我將在即將來臨的書——《Deep Learning for Computer Vision with Python》——中介紹如何訓練SqueezeNet來處理ImageNet數據集。

使用Python和Keras通過VGGNet，ResNet，Inception和Xception對圖像分類

新建一個文件，命名為classify_image.py，編輯插入下列代碼

1	# import the necessary packages
2	from keras.applications import ResNet50
3	from keras.applications import InceptionV3
4	from keras.applications import Xception # TensorFlow ONLY
5	from keras.applications import VGG16
6	from keras.applications import VGG19
7	from keras.applications import imagenet_utils
8	from keras.applications.inception_v3 import preprocess_input
9	from keras.preprocessing.image import img_to_array
10	from keras.preprocessing.image import load_img
11	import numpy as np
12	import argparse
13	import cv2

第2-13行導入需要的包，其中大部分都屬於Keras。

第2-6行分別導入ResNet，Inception V3, Xception, VGG16, 還有VGG19——注意Xception隻兼容TensorFlow後端。

第7行導入的image_utils包包含了一係列函數，使得對圖片進行前處理以及對分類結果解碼更加容易。

餘下的語句導入其它有用的函數，其中NumPy用於數學運算，cv2用於與OpenCV結合。

15	# construct the argument parse and parse the arguments
16	ap = argparse.ArgumentParser()
17	ap.add_argument("-i", "--image", required=True,
18	help="path to the input image")
19	ap.add_argument("-model", "--model", type=str, default="vgg16",
20	help="name of pre-trained network to use")
21	args = vars(ap.parse_args())

--image為希望進行分類的圖像的路徑。

--model為選用的CNN的類別，默認為VGG16。

23	# define a dictionary that maps model names to their classes
24	# inside Keras
25	MODELS = {
26	"vgg16": VGG16,
27	"vgg19": VGG19,
28	"inception": InceptionV3,
29	"xception": Xception, # TensorFlow ONLY
30	"resnet": ResNet50
31	}
32	
33	# esnure a valid model name was supplied via command line argument
34	if args["model"] not in MODELS.keys():
35	raise AssertionError("The --model command line argument should "
36	"be a key in the `MODELS` dictionary")

第25-31行定義了一個詞典，將類映射到對應的模型名稱。

如果沒有在該詞典中找到“--model”，就會報錯。

輸入一個圖像到一個CNN中會返回一係列鍵值，包含標簽及對應的概率。

ImageNet采用的圖像尺寸一般為224×224, 227×227, 256×256, and 299×299，但是並不是絕對。

VGG16，VGG19以及ResNet接受224×224的輸入圖像，而Inception V3和Xception要求為299×299，如下代碼所示：

38	# initialize the input image shape (224x224 pixels) along with
39	# the pre-processing function (this might need to be changed
40	# based on which model we use to classify our image)
41	inputShape = (224, 224)
42	preprocess = imagenet_utils.preprocess_input
43	
44	# if we are using the InceptionV3 or Xception networks, then we
45	# need to set the input shape to (299x299) [rather than (224x224)]
46	# and use a different image processing function
47	if args["model"] in ("inception", "xception"):
48	inputShape = (299, 299)
49	preprocess = preprocess_input

這裏我們初始化inputShape為224×224像素，初始化預處理函數為keras.preprocess_input——執行mean subtraction運算。

如果使用Inception或者Xception，inputShape需要改為299×299 像素，預處理函數改為separate pre-processing函數。

下一步就是從磁盤載入網絡架構的weights，並實例化模型：

51	# load our the network weights from disk (NOTE: if this is the
52	# first time you are running this script for a given network, the
53	# weights will need to be downloaded first -- depending on which
54	# network you are using, the weights can be 90-575MB, so be
55	# patient; the weights will be cached and subsequent runs of this
56	# script will be *much* faster)
57	print("[INFO] loading {}...".format(args["model"]))
58	Network = MODELS[args["model"]]
59	model = Network(weights="imagenet")

注意：VGG16和VGG19的weights大於500MB，ResNet的約等於100MB，Inception和Xception的介於90-100MB之間。如果這是你第一次運行某個網絡，這些weights會自動下載到你的磁盤。下載時間由你的網絡速度決定，而且下載完成後，下一次運行代碼不再需要重新下載。

61	# load the input image using the Keras helper utility while ensuring
62	# the image is resized to `inputShape`, the required input dimensions
63	# for the ImageNet pre-trained network
64	print("[INFO] loading and pre-processing image...")
65	image = load_img(args["image"], target_size=inputShape)
66	image = img_to_array(image)
67	
68	# our input image is now represented as a NumPy array of shape
69	# (inputShape[0], inputShape[1], 3) however we need to expand the
70	# dimension by making the shape (1, inputShape[0], inputShape[1], 3)
71	# so we can pass it through thenetwork
72	image = np.expand_dims(image, axis=0)
73	
74	# pre-process the image using the appropriate function based on the
75	# model that has been loaded (i.e., mean subtraction, scaling, etc.)
76	image = preprocess(image)

第65行從磁盤載入輸入圖像，並使用提供的inputShape初始化圖像的尺寸。

第66行將圖像從PIL/Pillow實例轉換成NumPy矩陣，矩陣的shape為(inputShape[0], inputShape[1], 3)。

因為我們往往使用CNN來批量訓練/分類圖像，所以需要使用np.expand_dims在矩陣中添加一個額外的維度，如第72行所示；添加後矩陣shape為(1, inputShape[0], inputShape[1], 3)。如果你忘記添加這個維度，當你的模型使用.predict時會報錯。

最後，第76行使用合適的預處理函數來執行mean subtraction/scaling。

下麵將我們的圖像傳遞給網絡並獲取分類結果：

78	# classify the image
79	print("[INFO] classifying image with '{}'...".format(args["model"]))
80	preds = model.predict(image)
81	P = imagenet_utils.decode_predictions(preds)
82	
83	# loop over the predictions and display the rank-5 predictions +
84	# probabilities to our terminal
85	for (i, (imagenetID, label, prob)) in enumerate(P[0]):
86	print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100))

第80行調用.predict函數，並從CNN返回預測值。

第81行的.decode_predictions函數將預測值解碼為易讀的鍵值對：標簽、以及該標簽的概率。

第85行和86行返回最可能的5個預測值並輸出到終端。

案例的最後一件事，是通過OpenCV從磁盤將輸入圖像讀取出來，在圖像上畫出最可能的預測值並顯示在我們的屏幕上。

88	# load the image via OpenCV, draw the top prediction on the image,
89	# and display the image to our screen
90	orig = cv2.imread(args["image"])
91	(imagenetID, label, prob) = P[0][0]
92	cv2.putText(orig, "Label: {}, {:.2f}%".format(label, prob * 100),
93	(10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)
94	cv2.imshow("Classification", orig)
95	cv2.waitKey(0)

VGGNet, ResNet, Inception, 和Xception的分類結果

所有的例子都是使用2.0以上版本的Keras以及TensorFlow後台做的。確保你的TensorFlow版本大於等於1.0，否則會報錯。所有例子也都使用Theano後端做過測試，工作良好。

案例需要的圖片以及代碼請前往原文獲取。

使用VGG16分類：

1	$ python classify_image.py --image images/soccer_ball.jpg --model vgg16

圖8: 使用VGG16來分類足球(source)

輸出為：soccer_ball，精確度為93.43%。

如果要使用VGG19，隻需要替換下--network參數。

1	$ python classify_image.py --image images/bmw.png --model vgg19

圖9: 使用VGG19來分類汽車(source)

輸出為：convertible（敞篷車），精確度為91.76%。然而，我們看一下其它的4個結果：sports car（運動汽車）， 4.98%（也對）；limousine（豪華轎車），1.06%（不正確，但也合理）；car wheel（車輪），0.75%（技術上正確，因為圖中確實出現了輪子）。

從下麵的例子，我們可以看到類似的結果：

1	$ python classify_image.py --image images/clint_eastwood.jpg --model resnet

圖10: 使用ResNet分類(source).

ResNet成功將圖像分類為revolver（左輪手槍），精確度69.79%。有趣的是rifle（步槍）為7.74%，assault rifle（突擊步槍）為5.63%。考慮到revolver的觀察角度還有相對於手槍來說巨大的槍管，CNN得出這麼高的概率也是合理的。

1	$ python classify_image.py --image images/jemma.png --model resnet

圖11: 使用ResNet對狗進行分類

狗的種類被正確識別為beagle（小獵兔狗），精確度94.48%。

然後我試著分類《加勒比海盜》中的圖片：

1	$ python classify_image.py --image images/boat.png --model inception

圖12: 使用ResNet對沉船進行分類(source)

盡管ImageNet中有“boat”（船）這個類別，Inception網絡仍然正確地將該場景識別為“（ship） wreck”（沉船），精確度96.29%。其它的標簽，比如“seashore”（海灘）, “canoe”（獨木舟）, “paddle”（槳）,還有“breakwater”（防波堤），也都相關，在特定的案例中也絕對正確。

下一個例子是我辦公室的長沙發，使用Inception進行分類。

1	$ python classify_image.py --image images/office.png --model inception

圖13: 使用Inception V3分類

Inception準確地識別出圖中有“table lamp”（台燈），精確度69.68%。其它的標簽也完全正確：“studio couch”（兩用沙發），“window shade”（窗簾), “lampshade”（燈罩）, 還有“pillow”（枕頭）。

下麵是Xception：

1	$ python classify_image.py --image images/scotch.png --model xception

圖14: 使用Xception分類(source)

Xception將圖片正確分類為“barrels”（桶）。

最後一個例子使用VGG16：

1	$ python classify_image.py --image images/tv.png --model vgg16

圖15: 使用VGG16分類。

圖片來自於《巫師3：狂獵》。VGG16的第一個預測值為“home theatre”（家庭影院），前5個預測中還有一個“television/monitor”（電視/顯示器），預測很合理。

正如你從上述例子中所看到的，使用IamgeNet數據集訓練的模型能夠準確識別大量日常生活中的物品。希望你能在自己的項目中用到這些代碼。

之後呢？

如果你想從頭開始訓練自己的深度學習網絡，該怎麼做？

我的新書能夠幫助你做到這些，從一個為深度學習的菜鳥成長為專家。

總結

在今天的博客中，我們回顧了5種卷積神經網絡（CNN）：

1. VGG16

2. VGG19

3. ResNet50

4. Inception V3

5. Xception

然後是使用這些架構對你自己的圖像進行分類。

如果你對深度學習還有卷積神經網絡有更多的興趣，一定要看一看我的新書《Deep Learning for Computer Vision with Python》，現在就可以進行訂購。

博客代碼下載。

關於作者：

Adrian Rosebrock，企業家兼博士，推出了兩個成功的圖像搜索引擎：ID My Pill和Chic Engine。

本文由北郵@愛可可-愛生活老師推薦，阿裏雲雲棲社區組織翻譯。

文章原標題《ImageNet: VGGNet, ResNet, Inception, and Xception with Keras》，作者：Adrian Rosebrock，譯者：楊輝，審閱：，附件為原文的pdf。

文章為簡譯，更為詳細的內容，請查看原文

最後更新：2017-05-07 14:31:28

Keras在ImageNet中的應用：VGGNet, ResNet, Inception以及Xeception

上一篇： centos7安裝python開發環境（python3，postgresql，sublime，supervisor）

下一篇：數據庫優化器原理 - 如何治療選擇綜合症

相關內容

熱門內容

最新內容

Keras在ImageNet中的應用：VGGNet, ResNet, Inception以及Xeception

上一篇： centos7安裝python開發環境（python3，postgresql，sublime，supervisor）

下一篇： 數據庫優化器原理 - 如何治療選擇綜合症

相關內容

熱門內容

最新內容

下一篇：數據庫優化器原理 - 如何治療選擇綜合症