paper review
17 Sep 2014
pdf
abstractWe propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.
Hebbian principle?
graphical view
mxnet의 tutorial에 나오는 그림.
plot
Previous Output
Previous Output
conv_in3a_1x1
Convolution
1x1/1x1, 64
conv_in3a_1x1->Previous Output
3x28x28
bn_in3a_1x1_gamma
bn_in3a_1x1_gamma
bn_in3a_1x1_beta
bn_in3a_1x1_beta
bn_in3a_1x1_moving_mean
bn_in3a_1x1_moving_mean
bn_in3a_1x1_moving_var
bn_in3a_1x1_moving_var
bn_in3a_1x1
bn_in3a_1x1
bn_in3a_1x1->conv_in3a_1x1
64x28x28
bn_in3a_1x1->bn_in3a_1x1_gamma
bn_in3a_1x1->bn_in3a_1x1_beta
bn_in3a_1x1->bn_in3a_1x1_moving_mean
bn_in3a_1x1->bn_in3a_1x1_moving_var
relu_in3a_1x1
Activation
relu
relu_in3a_1x1->bn_in3a_1x1
64x28x28
conv_in3a_3x3_reduce
Convolution
1x1/1x1, 64
conv_in3a_3x3_reduce->Previous Output
3x28x28
bn_in3a_3x3_reduce_gamma
bn_in3a_3x3_reduce_gamma
bn_in3a_3x3_reduce_beta
bn_in3a_3x3_reduce_beta
bn_in3a_3x3_reduce_moving_mean
bn_in3a_3x3_reduce_moving_mean
bn_in3a_3x3_reduce_moving_var
bn_in3a_3x3_reduce_moving_var
bn_in3a_3x3_reduce
bn_in3a_3x3_reduce
bn_in3a_3x3_reduce->conv_in3a_3x3_reduce
64x28x28
bn_in3a_3x3_reduce->bn_in3a_3x3_reduce_gamma
bn_in3a_3x3_reduce->bn_in3a_3x3_reduce_beta
bn_in3a_3x3_reduce->bn_in3a_3x3_reduce_moving_mean
bn_in3a_3x3_reduce->bn_in3a_3x3_reduce_moving_var
relu_in3a_3x3_reduce
Activation
relu
relu_in3a_3x3_reduce->bn_in3a_3x3_reduce
64x28x28
conv_in3a_3x3
Convolution
3x3/1x1, 64
conv_in3a_3x3->relu_in3a_3x3_reduce
64x28x28
bn_in3a_3x3_gamma
bn_in3a_3x3_gamma
bn_in3a_3x3_beta
bn_in3a_3x3_beta
bn_in3a_3x3_moving_mean
bn_in3a_3x3_moving_mean
bn_in3a_3x3_moving_var
bn_in3a_3x3_moving_var
bn_in3a_3x3
bn_in3a_3x3
bn_in3a_3x3->conv_in3a_3x3
64x28x28
bn_in3a_3x3->bn_in3a_3x3_gamma
bn_in3a_3x3->bn_in3a_3x3_beta
bn_in3a_3x3->bn_in3a_3x3_moving_mean
bn_in3a_3x3->bn_in3a_3x3_moving_var
relu_in3a_3x3
Activation
relu
relu_in3a_3x3->bn_in3a_3x3
64x28x28
conv_in3a_double_3x3_reduce
Convolution
1x1/1x1, 64
conv_in3a_double_3x3_reduce->Previous Output
3x28x28
bn_in3a_double_3x3_reduce_gamma
bn_in3a_double_3x3_reduce_gamma
bn_in3a_double_3x3_reduce_beta
bn_in3a_double_3x3_reduce_beta
bn_in3a_double_3x3_reduce_moving_mean
bn_in3a_double_3x3_reduce_moving_mean
bn_in3a_double_3x3_reduce_moving_var
bn_in3a_double_3x3_reduce_moving_var
bn_in3a_double_3x3_reduce
bn_in3a_double_3x3_reduce
bn_in3a_double_3x3_reduce->conv_in3a_double_3x3_reduce
64x28x28
bn_in3a_double_3x3_reduce->bn_in3a_double_3x3_reduce_gamma
bn_in3a_double_3x3_reduce->bn_in3a_double_3x3_reduce_beta
bn_in3a_double_3x3_reduce->bn_in3a_double_3x3_reduce_moving_mean
bn_in3a_double_3x3_reduce->bn_in3a_double_3x3_reduce_moving_var
relu_in3a_double_3x3_reduce
Activation
relu
relu_in3a_double_3x3_reduce->bn_in3a_double_3x3_reduce
64x28x28
conv_in3a_double_3x3_0
Convolution
3x3/1x1, 96
conv_in3a_double_3x3_0->relu_in3a_double_3x3_reduce
64x28x28
bn_in3a_double_3x3_0_gamma
bn_in3a_double_3x3_0_gamma
bn_in3a_double_3x3_0_beta
bn_in3a_double_3x3_0_beta
bn_in3a_double_3x3_0_moving_mean
bn_in3a_double_3x3_0_moving_mean
bn_in3a_double_3x3_0_moving_var
bn_in3a_double_3x3_0_moving_var
bn_in3a_double_3x3_0
bn_in3a_double_3x3_0
bn_in3a_double_3x3_0->conv_in3a_double_3x3_0
96x28x28
bn_in3a_double_3x3_0->bn_in3a_double_3x3_0_gamma
bn_in3a_double_3x3_0->bn_in3a_double_3x3_0_beta
bn_in3a_double_3x3_0->bn_in3a_double_3x3_0_moving_mean
bn_in3a_double_3x3_0->bn_in3a_double_3x3_0_moving_var
relu_in3a_double_3x3_0
Activation
relu
relu_in3a_double_3x3_0->bn_in3a_double_3x3_0
96x28x28
conv_in3a_double_3x3_1
Convolution
3x3/1x1, 96
conv_in3a_double_3x3_1->relu_in3a_double_3x3_0
96x28x28
bn_in3a_double_3x3_1_gamma
bn_in3a_double_3x3_1_gamma
bn_in3a_double_3x3_1_beta
bn_in3a_double_3x3_1_beta
bn_in3a_double_3x3_1_moving_mean
bn_in3a_double_3x3_1_moving_mean
bn_in3a_double_3x3_1_moving_var
bn_in3a_double_3x3_1_moving_var
bn_in3a_double_3x3_1
bn_in3a_double_3x3_1
bn_in3a_double_3x3_1->conv_in3a_double_3x3_1
96x28x28
bn_in3a_double_3x3_1->bn_in3a_double_3x3_1_gamma
bn_in3a_double_3x3_1->bn_in3a_double_3x3_1_beta
bn_in3a_double_3x3_1->bn_in3a_double_3x3_1_moving_mean
bn_in3a_double_3x3_1->bn_in3a_double_3x3_1_moving_var
relu_in3a_double_3x3_1
Activation
relu
relu_in3a_double_3x3_1->bn_in3a_double_3x3_1
96x28x28
avg_pool_in3a_pool
Pooling
avg, 3x3/1x1
avg_pool_in3a_pool->Previous Output
3x28x28
conv_in3a_proj
Convolution
1x1/1x1, 32
conv_in3a_proj->avg_pool_in3a_pool
3x28x28
bn_in3a_proj_gamma
bn_in3a_proj_gamma
bn_in3a_proj_beta
bn_in3a_proj_beta
bn_in3a_proj_moving_mean
bn_in3a_proj_moving_mean
bn_in3a_proj_moving_var
bn_in3a_proj_moving_var
bn_in3a_proj
bn_in3a_proj
bn_in3a_proj->conv_in3a_proj
32x28x28
bn_in3a_proj->bn_in3a_proj_gamma
bn_in3a_proj->bn_in3a_proj_beta
bn_in3a_proj->bn_in3a_proj_moving_mean
bn_in3a_proj->bn_in3a_proj_moving_var
relu_in3a_proj
Activation
relu
relu_in3a_proj->bn_in3a_proj
32x28x28
ch_concat_in3a_chconcat
ch_concat_in3a_chconcat
ch_concat_in3a_chconcat->relu_in3a_1x1
64x28x28
ch_concat_in3a_chconcat->relu_in3a_3x3
64x28x28
ch_concat_in3a_chconcat->relu_in3a_double_3x3_1
96x28x28
ch_concat_in3a_chconcat->relu_in3a_proj
32x28x28
위 net의 코드를 github 에서 볼 수 있다. 아무리 봐도 tensorflow보다 간결하고 직관적이란 말이야...
a