Skip to content

VGG

Visual Geometry Group [2014]

Simonyan K , Zisserman A .Very Deep Convolutional Networks for Large-Scale Image Recognition[J].Computer Science, 2014.DOI:10.48550/arXiv.1409.1556.

完全使用小的卷积核(3×3)代替大尺寸卷积(如 AlexNet 中的 11×11 核)

特征:

  • 输入图像尺寸均为 224×224
  • 均为 5 层 Max Pooling
  • 所有卷积层后接 ReLU 激活
  • 全系列均未使用局部响应归一化(LRN)
  • 卷积部分后均为 3 层全连接(FC-4096 → FC-4096 → FC-1000)

VGG-16

结构

层名称类型配置输出尺寸参数量
input输入层224×224 RGB 图像224×224×30
conv1_1卷积层3×3×64, stride=1, padding=1224×224×641,792
conv1_2卷积层3×3×64, stride=1, padding=1224×224×6436,928
pool1最大池化2×2, stride=2112×112×640
conv2_1卷积层3×3×128, stride=1, padding=1112×112×12873,856
conv2_2卷积层3×3×128, stride=1, padding=1112×112×128147,584
pool2最大池化2×2, stride=256×56×1280
conv3_1卷积层3×3×256, stride=1, padding=156×56×256295,168
conv3_2卷积层3×3×256, stride=1, padding=156×56×256590,080
conv3_3卷积层3×3×256, stride=1, padding=156×56×256590,080
pool3最大池化2×2, stride=228×28×2560
conv4_1卷积层3×3×512, stride=1, padding=128×28×5121,180,160
conv4_2卷积层3×3×512, stride=1, padding=128×28×5122,359,808
conv4_3卷积层3×3×512, stride=1, padding=128×28×5122,359,808
pool4最大池化2×2, stride=214×14×5120
conv5_1卷积层3×3×512, stride=1, padding=114×14×5122,359,808
conv5_2卷积层3×3×512, stride=1, padding=114×14×5122,359,808
conv5_3卷积层3×3×512, stride=1, padding=114×14×5122,359,808
pool5最大池化2×2, stride=27×7×5120
flatten展平层-7×7×5120
fc6全连接层25088 → 40964096102,764,544
fc7全连接层4096 → 4096409616,781,312
fc8全连接层4096 → 100010004,097,000
softmax分类层-10000

总参数量:138,357,544

卷积层参数计算

kernelwidth×kernelheight×(channelsinput+1)×channelsoutput