convnet-benchmarks

Easy benchmarking of all publicly accessible implementations of convnets

  • 所有者: soumith/convnet-benchmarks
  • 平台:
  • 許可證: MIT License
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

convnet-benchmarks

Easy benchmarking of all public open-source implementations of convnets.
A summary is provided in the section below.

Machine: 6-core Intel Core i7-5930K CPU @ 3.50GHz + NVIDIA Titan X + Ubuntu 14.04 x86_64

Imagenet Winners Benchmarking

I pick some popular imagenet models, and I clock the time for a full forward + backward pass. I average my times over 10 runs. I ignored dropout and softmax layers.

Notation

Input is described as {batch_size}x{num_filters}x{filter_width}x{filter_height}. Where batch_size is the number of images used in a minibatch, num_filters is the number of channels in an image, filter_width is the width of the image, and filter_height is the height of the image.

One small note:

The CuDNN benchmarks are done using Torch bindings. One can also do the same via Caffe bindings or bindings of any other library. This note is here to clarify that Caffe (native) and Torch (native) are the convolution kernels which are present as a default fallback. Some of the frameworks like TensorFlow and Chainer are benchmarked with CuDNN, but it is not explicitly mentioned, and hence one might think that these frameworks as a whole are faster, than for example Caffe, which might not be the case.

AlexNet (One Weird Trick paper) - Input 128x3x224x224, Library, Class, Time (ms), forward (ms), backward (ms), :------------------------:, :-----------------------------------------------------------------------------------------------------------:, ----------:, ------------:, -------------:, CuDNN[R4]-fp16 (Torch), cudnn.SpatialConvolution, 71, 25, 46, Nervana-neon-fp16, ConvLayer, 78, 25, 52, CuDNN[R4]-fp32 (Torch), cudnn.SpatialConvolution, 81, 27, 53, TensorFlow, conv2d, 81, 26, 55, Nervana-neon-fp32, ConvLayer, 87, 28, 58, fbfft (Torch), fbnn.SpatialConvolution, 104, 31, 72, Chainer, Convolution2D, 177, 40, 136, cudaconvnet2*, ConvLayer, 177, 42, 135, CuDNN[R2] , cudnn.SpatialConvolution, 231, 70, 161, Caffe (native), ConvolutionLayer, 324, 121, 203, Torch-7 (native), SpatialConvolutionMM, 342, 132, 210, CL-nn (Torch), SpatialConvolutionMM, 963, 388, 574, Caffe-CLGreenTea, ConvolutionLayer, 1442, 210, 1232, Overfeat [fast] - Input 128x3x231x231, Library, Class, Time (ms), forward (ms), backward (ms), :------------------------:, :------------------------------------------------------------------------------------------------------------------------:, -----------------:, -----------------------:, ------------------------:, Nervana-neon-fp16, ConvLayer, 176, 58, 118, Nervana-neon-fp32, ConvLayer, 211, 69, 141, CuDNN[R4]-fp16 (Torch), cudnn.SpatialConvolution, 242, 86, 156, CuDNN[R4]-fp32 (Torch), cudnn.SpatialConvolution, 268, 94, 174, TensorFlow, conv2d, 279, 90, 189, fbfft (Torch), SpatialConvolutionCuFFT, 342, 114, 227, Chainer, Convolution2D, 620, 135, 484, cudaconvnet2, ConvLayer, 723, 176, 547, CuDNN[R2] , cudnn.SpatialConvolution, 810, 234, 576, Caffe, ConvolutionLayer, 823, 355, 468, Torch-7 (native), SpatialConvolutionMM, 878, 379, 499, CL-nn (Torch), SpatialConvolutionMM, 963, 388, 574, Caffe-CLGreenTea, ConvolutionLayer, 2857, 616, 2240, OxfordNet [Model-A] - Input 64x3x224x224, Library, Class, Time (ms), forward (ms), backward (ms), :------------------------:, :------------------------------------------------------------------------------------------------------------------------:, -----------------:, -----------------------:, ------------------------:, Nervana-neon-fp16, ConvLayer, 254, 82, 171, Nervana-neon-fp32, ConvLayer, 320, 103, 217, CuDNN[R4]-fp16 (Torch), cudnn.SpatialConvolution, 471, 140, 331, CuDNN[R4]-fp32 (Torch), cudnn.SpatialConvolution, 529, 162, 366, TensorFlow, conv2d, 540, 158, 382, Chainer, Convolution2D, 885, 251, 632, fbfft (Torch), SpatialConvolutionCuFFT, 1092, 355, 737, cudaconvnet2, ConvLayer, 1229, 408, 821, CuDNN[R2] *, cudnn.SpatialConvolution, 1099, 342, 757, Caffe, ConvolutionLayer, 1068, 323, 745, Torch-7 (native), SpatialConvolutionMM, 1105, 350, 755, CL-nn (Torch), SpatialConvolutionMM, 3437, 875, 2562, Caffe-CLGreenTea, ConvolutionLayer, 5620, 988, 4632, GoogleNet V1 - Input 128x3x224x224, Library, Class, Time (ms), forward (ms), backward (ms), :------------------------:, :------------------------------------------------------------------------------------------------------------------------:, -----------------:, -----------------------:, ------------------------:, Nervana-neon-fp16, ConvLayer, 230, 72, 157, Nervana-neon-fp32, ConvLayer, 270, 84, 186, TensorFlow, conv2d, 445, 135, 310, CuDNN[R4]-fp16 (Torch), cudnn.SpatialConvolution, 462, 112, 349, CuDNN[R4]-fp32 (Torch), cudnn.SpatialConvolution, 470, 130, 340, Chainer, Convolution2D, 687, 189, 497, Caffe, ConvolutionLayer, 1935, 786, 1148, CL-nn (Torch), SpatialConvolutionMM, 7016, 3027, 3988, Caffe-CLGreenTea, ConvolutionLayer, 9462, 746, 8716, ## Layer-wise Benchmarking (Last Updated April 2015)

Spatial Convolution layer (3D input 3D output, densely connected)

forward + backprop (wrt input and weights), Original Library, Class/Function Benchmarked, Time (ms), forward (ms), backward (ms), :------------------------:, :------------------------------------------------------------------------------------------------------------------------:, -----------------:, -----------------------:, ------------------------:, fbfft, SpatialConvolutionCuFFT, 256, 101, 155, cuda-convnet2 , ConvLayer, 977, 201, 776, cuda-convnet*, pylearn2.cuda_convnet, 1077, 312, 765, CuDNN R2 , cudnn.SpatialConvolution, 1019, 269, 750, Theano, CorrMM, 1225, 407, 818, Caffe, ConvolutionLayer, 1231, 396, 835, Torch-7, SpatialConvolutionMM, 1265, 418, 877, DeepCL, ConvolutionLayer, 6280, 2648, 3632, cherry-picking, best per layer, 235, 79, 155, This table is NOT UPDATED For TITAN-X. These numbers below were on Titan Black and are here only for informational and legacy purposes., Original Library, Class/Function Benchmarked, Time (ms), forward (ms), backward (ms), :------------------------:, :------------------------------------------------------------------------------------------------------------------------:, -----------------:, -----------------------:, ------------------------:, Theano (experimental), conv2d_fft, 1178, 304, 874, Torch-7, nn.SpatialConvolutionBHWD, 1892, 581, 1311, ccv, ccv_convnet_layer, 809+bw, 809, Theano (legacy), conv2d, 70774, 3833, 66941, * * indicates that the library was tested with Torch bindings of the specific kernels.
  • ** indicates that the library was tested with Pylearn2 bindings.
  • *** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
  • **** The last row shows results obtainable when choosing the best-performing library for each layer.
  • L1 - Input: 128x128 Batch-size 128, Feature maps: 3->96, Kernel Size: 11x11, Stride: 1x1
  • L2 - Input: 64x64 Batch-size 128, Feature maps: 64->128, Kernel Size: 9x9, Stride: 1x1
  • L3 - Input: 32x32 Batch-size 128, Feature maps: 128->128, Kernel Size: 9x9, Stride: 1x1
  • L4 - Input: 16x16 Batch-size 128, Feature maps: 128->128, Kernel Size: 7x7, Stride: 1x1
  • L5 - Input: 13x13 Batch-size 128, Feature maps: 384->384, Kernel Size: 3x3, Stride: 1x1
  • The table is ranked according to the total time forward+backward calls for layers (L1 + L2 + L3 + L4 + L5)
Breakdown
forward

Columns L1, L2, L3, L4, L5, Total are times in milliseconds, Original Library, Class/Function Benchmarked, L1, L2, L3, L4, L5, Total, :------------------------:, :---------------------------------------------------------------------------------------------------------------------------------:, ---:, ----:, ---:, --:, ---:, -----:, fbfft, SpatialConvolutionCuFFT, 57, 27, 6, 2, 9, 101, cuda-convnet2 , ConvLayer, 36, 113, 40, 4, 8, 201, cuda-convnet*, pylearn2.cuda_convnet, 38, 183, 68, 7, 16, 312, CuDNN R2, cudnn.SpatialConvolution, 56, 143, 53, 6, 11, 269, Theano, CorrMM, 91, 143, 121, 24, 28, 407, Caffe, ConvolutionLayer<Dtype>, 93, 136, 116, 24, 27, 396, Torch-7, nn.SpatialConvolutionMM, 94, 149, 123, 24, 28, 418, DeepCL, ConvolutionLayer, 738, 1241, 518, 47, 104, 2648, cherry-picking, best per layer, 36, 27, 6, 2, 8, 79, ###### backward (gradInput + gradWeight)
Columns L1, L2, L3, L4, L5, Total are times in milliseconds, Original Library, Class/Function Benchmarked, L1, L2, L3, L4, L5, Total, :------------------------:, :---------------------------------------------------------------------------------------------------------------------------------:, ---:, ---:, ---:, --:, --:, -----:, fbfft, SpatialConvolutionCuFFT, 76, 45, 12, 4, 18, 155, cuda-convnet2 , ConvLayer, 103, 467, 162, 15, 29, 776, cuda-convnet*, pylearn2.cuda_convnet, 136, 433, 147, 15, 34, 765, CuDNN R2, cudnn.SpatialConvolution, 139, 401, 159, 19, 32, 750, Theano, CorrMM, 179, 405, 174, 29, 31, 818, Caffe, ConvolutionLayer<Dtype>, 200, 405, 172, 28, 30, 835, Torch-7, nn.SpatialConvolutionMM, 206, 432, 178, 29, 32, 877, DeepCL, ConvolutionLayer, 484, 2144, 747, 59, 198, 3632, cherry-picking
, best per layer, 76, 45, 12, 4, 18, 155

主要指標

概覽
名稱與所有者soumith/convnet-benchmarks
主編程語言Python
編程語言Makefile (語言數: 8)
平台
許可證MIT License
所有者活动
創建於2014-07-12 03:18:46
推送於2017-06-09 15:12:02
最后一次提交2017-06-09 18:12:01
發布數0
用户参与
星數2.7k
關注者數284
派生數573
提交數445
已啟用問題?
問題數77
打開的問題數34
拉請求數57
打開的拉請求數0
關閉的拉請求數5
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?