PytorchInsight
This is a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results.
This repository aims to accelarate the advance of Deep Learning Research, make reproducible results and easier for doing researches, and in Pytorch.
Including Papers (to be updated):
Attention Models
- SENet: Squeeze-and-excitation Networks (paper)
- SKNet: Selective Kernel Networks (paper)
- CBAM: Convolutional Block Attention Module (paper)
- GCNet: GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (paper)
- BAM: Bottleneck Attention Module (paper)
- SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks (paper)
- SRMNet: SRM: A Style-based Recalibration Module for Convolutional Neural Networks (paper)
Non-Attention Models
- OctNet: Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution (paper)
- imagenet_tricks.py: Bag of Tricks for Image Classification with Convolutional Neural Networks (paper)
- Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer (to appear)
- Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay (to appear)
- mixup: Beyond Empirical Risk Minimization (paper)
- CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (paper)
Trained Models and Performance Table
Single crop validation error on ImageNet-1k (center 224x224 crop from resized image with shorter side = 256)., classifiaction training settings for media and large models, :-:, :-:, Details, RandomResizedCrop, RandomHorizontalFlip; 0.1 init lr, total 100 epochs, decay at every 30 epochs; SGD with naive softmax cross entropy loss, 1e-4 weight decay, 0.9 momentum, 8 gpus, 32 images per gpu, Examples, ResNet50, Note, The newest code adds one default operation: setting all bias wd = 0, please refer to the theoretical analysis of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay" (to appear), thereby the training accuracy can be slightly boosted, classifiaction training settings for mobile/small models, :-:, :-:, Details, RandomResizedCrop, RandomHorizontalFlip; 0.4 init lr, total 300 epochs, 5 linear warm up epochs, cosine lr decay; SGD with softmax cross entropy loss and label smoothing 0.1, 4e-5 weight decay on conv weights, 0 weight decay on all other weights, 0.9 momentum, 8 gpus, 128 images per gpu, Examples, ShuffleNetV2, ## Typical Training & Testing Tips:
Small Models
ShuffleNetV2_1x
python -m torch.distributed.launch --nproc_per_node=8 imagenet_mobile.py --cos -a shufflenetv2_1x --data /path/to/imagenet1k/ \
--epochs 300 --wd 4e-5 --gamma 0.1 -c checkpoints/imagenet/shufflenetv2_1x --train-batch 128 --opt-level O0 --nowd-bn # Triaing
python -m torch.distributed.launch --nproc_per_node=2 imagenet_mobile.py -a shufflenetv2_1x --data /path/to/imagenet1k/ \
-e --resume ../pretrain/shufflenetv2_1x.pth.tar --test-batch 100 --opt-level O0 # Testing, ~69.6% top-1 Acc
Large Models
SGE-ResNet
python -W ignore imagenet.py -a sge_resnet101 --data /path/to/imagenet1k/ --epochs 100 --schedule 30 60 90 \
--gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --gpu-id 0,1,2,3,4,5,6,7 # Training
python -m torch.distributed.launch --nproc_per_node=8 imagenet_fast.py -a sge_resnet101 --data /path/to/imagenet1k/ \
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --train-batch 32 \
--opt-level O0 --wd-all --label-smoothing 0. --warmup 0 # Training (faster)
python -W ignore imagenet.py -a sge_resnet101 --data /path/to/imagenet1k/ --gpu-id 0,1 -e --resume ../pretrain/sge_resnet101.pth.tar \
# Testing ~78.8% top-1 Acc
python -m torch.distributed.launch --nproc_per_node=2 imagenet_fast.py -a sge_resnet101 --data /path/to/imagenet1k/ -e --resume \
../pretrain/sge_resnet101.pth.tar --test-batch 100 --opt-level O0 # Testing (faster) ~78.8% top-1 Acc
WS-ResNet with e-shifted L2 regularizer, e = 1e-3
python -m torch.distributed.launch --nproc_per_node=8 imagenet_fast.py -a ws_resnet50 --data /share1/public/public/imagenet1k/ \
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/es1e-3_ws_resnet50 --train-batch 32 \
--opt-level O0 --label-smoothing 0. --warmup 0 --nowd-conv --mineps 1e-3 --el2
Results of "SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks"
Note the following results (old) do not set the bias wd = 0 for large models
Classification, Model, #P, GFLOPs, Top-1 Acc, Top-5 Acc, Download1, Download2, log, :-:, :-:, :-:, :-:, :-:, :-:, :-:, :-:, ShuffleNetV2_1x, 2.28M, 0.151, 69.6420, 88.7200, GoogleDrive, shufflenetv2_1x.log, ResNet50, 25.56M, 4.122, 76.3840, 92.9080, BaiduDrive(zuvx), GoogleDrive, old_resnet50.log, SE-ResNet50, 28.09M, 4.130, 77.1840, 93.6720, SK-ResNet50, 26.15M, 4.185, 77.5380, 93.7000, BaiduDrive(tfwn), GoogleDrive, sk_resnet50.log, BAM-ResNet50, 25.92M, 4.205, 76.8980, 93.4020, BaiduDrive(z0h3), GoogleDrive, bam_resnet50.log, CBAM-ResNet50, 28.09M, 4.139, 77.6260, 93.6600, BaiduDrive(bram), GoogleDrive, cbam_resnet50.log, SGE-ResNet50, 25.56M, 4.127, 77.5840, 93.6640, BaiduDrive(gxo9), GoogleDrive, sge_resnet50.log, ResNet101, 44.55M, 7.849, 78.2000, 93.9060, BaiduDrive(js5t), GoogleDrive, old_resnet101.log, SE-ResNet101, 49.33M, 7.863, 78.4680, 94.1020, BaiduDrive(j2ox), GoogleDrive, se_resnet101.log, SK-ResNet101, 45.68M, 7.978, 78.7920, 94.2680, BaiduDrive(boii), GoogleDrive, sk_resnet101.log, BAM-ResNet101, 44.91M, 7.933, 78.2180, 94.0180, BaiduDrive(4bw6), GoogleDrive, bam_resnet101.log, CBAM-ResNet101, 49.33M, 7.879, 78.3540, 94.0640, BaiduDrive(syj3), GoogleDrive, cbam_resnet101.log, SGE-ResNet101, 44.55M, 7.858, 78.7980, 94.3680, BaiduDrive(wqn6), GoogleDrive, sge_resnet101.log, ### Detection, Model, #p, GFLOPs, Detector, Neck, AP50:95 (%), AP50 (%), AP75 (%), Download, :-:, :-:, :-:, :-:, :-:, :-:, :-:, :-:, :-:, ResNet50, 23.51M, 88.0, Faster RCNN, FPN, 37.5, 59.1, 40.6, GoogleDrive, SGE-ResNet50, 23.51M, 88.1, Faster RCNN, FPN, 38.7, 60.8, 41.7, GoogleDrive, ResNet50, 23.51M, 88.0, Mask RCNN, FPN, 38.6, 60.0, 41.9, GoogleDrive, SGE-ResNet50, 23.51M, 88.1, Mask RCNN, FPN, 39.6, 61.5, 42.9, GoogleDrive, ResNet50, 23.51M, 88.0, Cascade RCNN, FPN, 41.1, 59.3, 44.8, GoogleDrive, SGE-ResNet50, 23.51M, 88.1, Cascade RCNN, FPN, 42.6, 61.4, 46.2, GoogleDrive, ResNet101, 42.50M, 167.9, Faster RCNN, FPN, 39.4, 60.7, 43.0, GoogleDrive, SE-ResNet101, 47.28M, 168.3, Faster RCNN, FPN, 40.4, 61.9, 44.2, GoogleDrive, SGE-ResNet101, 42.50M, 168.1, Faster RCNN, FPN, 41.0, 63.0, 44.3, GoogleDrive, ResNet101, 42.50M, 167.9, Mask RCNN, FPN, 40.4, 61.6, 44.2, GoogleDrive, SE-ResNet101, 47.28M, 168.3, Mask RCNN, FPN, 41.5, 63.0, 45.3, GoogleDrive, SGE-ResNet101, 42.50M, 168.1, Mask RCNN, FPN, 42.1, 63.7, 46.1, GoogleDrive, ResNet101, 42.50M, 167.9, Cascade RCNN, FPN, 42.6, 60.9, 46.4, GoogleDrive, SE-ResNet101, 47.28M, 168.3, Cascade RCNN, FPN, 43.4, 62.2, 47.2, GoogleDrive, SGE-ResNet101, 42.50M, 168.1, Cascade RCNN, FPN, 44.4, 63.2, 48.4, GoogleDrive, --------------------------------------------------------
Results of "Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer"
Note that the following models are with bias wd = 0.
Classification, Model, Top-1, Download, :-:, :-:, :-:, WS-ResNet50, 76.74, GoogleDrive, WS-ResNet50(e = 1e-3), 76.86, GoogleDrive, WS-ResNet101, 78.07, GoogleDrive, WS-ResNet101(e = 1e-6), 78.29, GoogleDrive, WS-ResNeXt50(e = 1e-3), 77.88, GoogleDrive, WS-ResNeXt101(e = 1e-3), 78.80, GoogleDrive, WS-DenseNet201(e = 1e-8), 77.59, GoogleDrive, WS-ShuffleNetV1(e = 1e-8), 68.09, GoogleDrive, WS-ShuffleNetV2(e = 1e-8), 69.70, GoogleDrive, WS-MobileNetV1(e = 1e-6), 73.60, GoogleDrive, --------------------------------------------------------
Results of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay"
To appear
Citation
If you find our related works useful in your research, please consider citing the paper:
@inproceedings{li2019selective,
title={Selective Kernel Networks},
author={Li, Xiang and Wang, Wenhai and Hu, Xiaolin and Yang, Jian},
journal={IEEE Conference on Computer Vision and Pattern Recognition},
year={2019}
}
@inproceedings{li2019spatial,
title={Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks},
author={Li, Xiang and Hu, Xiaolin and Xia, Yan and Yang, Jian},
journal={arXiv preprint arXiv:1905.09646},
year={2019}
}
@inproceedings{li2019understanding,
title={Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer},
author={Li, Xiang and Chen, Shuo and Yang, Jian},
journal={arXiv preprint arXiv:},
year={2019}
}
@inproceedings{li2019generalization,
title={Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay},
author={Li, Xiang and Chen, Shuo and Gong, Chen and Xia, Yan and Yang, Jian},
journal={arXiv preprint arXiv:},
year={2019}
}