用于图像分类的GoogLeNet

TensorFlow实现GoogLeNet和Inception的图像分类。「TensorFlow implementation of GoogLeNet and Inception for image classification.」

Github星跟蹤圖

GoogLeNet for Image Classification

  • This repository contains the examples of natural image classification using pre-trained model as well as training a Inception network from scratch on CIFAR-10 dataset (93.64% accuracy on testing set). The pre-trained model on CIFAR-10 can be download from here.
  • Architecture of GoogLeNet from the paper:
    googlenet

Requirements

Implementation Details

For testing the pre-trained model

  • Images are rescaled so that the smallest side equals 224 before fed into the model. This is not the same as the original paper which is an ensemble of 7 similar models using 144 224x224 crops per image for testing. So the performance will not be as good as the original paper.

For training from scratch on CIFAR-10

  • All the LRN layers are removed from the convolutional layers.
  • Batch normalization and ReLU activation are used in all the convolutional layers including the layers in Inception structure except the output layer.
  • Two auxiliary classifiers are used as mentioned in the paper, though 512 instead of 1024 hidden units are used in the two fully connected layers to reduce the computation. However, I found the results are almost the same on CIFAR-10 with and without auxiliary classifiers.
  • Since the 32 x 32 images are down-sampled to 1 x 1 before fed into inception_5a, this makes the multi-scale structure of inception layers less useful and harm the performance (around 80% accuracy). To make full use of the multi-scale structures, the stride of the first convolutional layer is reduced to 1 and the first two max pooling layers are removed. The the feature map (32 x 32 x channels) will have almost the same size as described in table 1 (28 x 28 x channel) in the paper before fed into inception_3a. I have also tried only reduce the stride or only remove one max pooling layer. But I found the current setting provides the best performance on the testing set.
  • During training, dropout with keep probability 0.4 is applied to two fully connected layers and weight decay with 5e-4 is used as well.
  • The network is trained through Adam optimizer. Batch size is 128. The initial learning rate is 1e-3, decays to 1e-4 after 30 epochs, and finally decays to 1e-5 after 50 epochs.
  • Each color channel of the input images are subtracted by the mean value computed from the training set.

Usage

ImageNet Classification

Preparation

  • Download the pre-trained parameters here. This is original from here.
  • Setup path in examples/inception_pretrained.py: PRETRINED_PATH is the path for pre-trained model. DATA_PATH is the path to put testing images.

Run

Go to examples/ and put test image in folder DATA_PATH, then run the script:

python inception_pretrained.py --im_name PART_OF_IMAGE_NAME
  • --im_name is the option for image names you want to test. If the testing images are all png files, this can be png. The default setting is .jpg.
  • The output will be the top-5 class labels and probabilities.

Train the network on CIFAR-10

Preparation

  • Download CIFAR-10 dataset from here
  • Setup path in examples/inception_cifar.py: DATA_PATH is the path to put CIFAR-10. SAVE_PATH is the path to save or load summary file and trained model.

Train the model

Go to examples/ and run the script:

python inception_cifar.py --train \
  --lr LEARNING_RATE \
  --bsize BATCH_SIZE \
  --keep_prob KEEP_PROB_OF_DROPOUT \
  --maxepoch MAX_TRAINING_EPOCH
  • Summary and model will be saved in SAVE_PATH. One pre-trained model on CIFAR-10 can be downloaded from here.

Evaluate the model

Go to examples/ and put the pre-trained model in SAVE_PATH. Then run the script:

python inception_cifar.py --eval \
  --load PRE_TRAINED_MODEL_ID
  • The pre-trained ID is epoch ID shown in the save modeled file name. The default value is 99, which indicates the one I uploaded.
  • The output will be the accuracy of training and testing set.

Results

Image classification using pre-trained model

  • Top five predictions are shown. The probabilities are shown keeping two decimal places. Note that the pre-trained model are trained on ImageNet.
  • Result of VGG19 for the same images can be found here.
    The pre-processing of images for both experiments are the same.

Data Source

主要指標

概覽
名稱與所有者conan7882/GoogLeNet-Inception
主編程語言Python
編程語言Python (語言數: 1)
平台Linux, Mac, Windows
許可證MIT License
所有者活动
創建於2017-12-06 19:26:24
推送於2020-03-11 20:56:53
最后一次提交2020-03-11 13:56:52
發布數0
用户参与
星數287
關注者數13
派生數120
提交數42
已啟用問題?
問題數12
打開的問題數5
拉請求數1
打開的拉請求數0
關閉的拉請求數0
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?