TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

@misc{you2019torchcv,
    author = {Ansheng You and Xiangtai Li and Zhen Zhu and Yunhai Tong},
    title = {TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision},
    howpublished = {\url{https://github.com/donnyyou/torchcv}},
    year = {2019}
}

This repository provides source code for most deep learning based cv problems. We'll do our best to keep this repository up-to-date. If you do find a problem about this repository, please raise an issue or submit a pull request.

Implemented Papers

Image Classification
- VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition
- ResNet: Deep Residual Learning for Image Recognition
- DenseNet: Densely Connected Convolutional Networks
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ShuffleNet V2: Practical Guidelines for Ecient CNN Architecture Design
- Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Semantic Segmentation
- DeepLabV3: Rethinking Atrous Convolution for Semantic Image Segmentation
- PSPNet: Pyramid Scene Parsing Network
- DenseASPP: DenseASPP for Semantic Segmentation in Street Scenes
- Asymmetric Non-local Neural Networks for Semantic Segmentation
Object Detection
- SSD: Single Shot MultiBox Detector
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- YOLOv3: An Incremental Improvement
- FPN: Feature Pyramid Networks for Object Detection
Pose Estimation
- CPM: Convolutional Pose Machines
- OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Instance Segmentation
- Mask R-CNN
Generative Adversarial Networks
- Pix2pix: Image-to-Image Translation with Conditional Adversarial Nets
- CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

QuickStart with TorchCV

Now only support Python3.x, pytorch 1.3.

pip3 install -r requirements.txt
cd lib/exts
sh make.sh

Performances with TorchCV

All the performances showed below fully reimplemented the papers' results.

Image Classification

ImageNet (Center Crop Test): 224x224, Model, Train, Test, Top-1, Top-5, BS, Iters, Scripts, :--------, :---------, :------, :------, :------, :------, :------, :------, ResNet50, train, val, 77.54, 93.59, 512, 30W, ResNet50, ResNet101, train, val, 78.94, 94.56, 512, 30W, ResNet101, ShuffleNetV2x0.5, train, val, 60.90, 82.54, 1024, 40W, ShuffleNetV2x0.5, ShuffleNetV2x1.0, train, val, 69.71, 88.91, 1024, 40W, ShuffleNetV2x1.0, DFNetV1, train, val, 70.99, 89.68, 1024, 40W, DFNetV1, DFNetV2, train, val, 74.22, 91.61, 1024, 40W, DFNetV2, #### Semantic Segmentation
Cityscapes (Single Scale Whole Image Test): Base LR 0.01, Crop Size 769, Model, Backbone, Train, Test, mIOU, BS, Iters, Scripts, :--------, :---------, :------, :------, :------, :------, :------, :------, PSPNet, 3x3-Res101, train, val, 78.20, 8, 4W, PSPNet, DeepLabV3, 3x3-Res101, train, val, 79.13, 8, 4W, DeepLabV3, - ADE20K (Single Scale Whole Image Test): Base LR 0.02, Crop Size 520, Model, Backbone, Train, Test, mIOU, PixelACC, BS, Iters, Scripts, :--------, :---------, :------, :------, :------, :------, :------, :------, :------, PSPNet, 3x3-Res50, train, val, 41.52, 80.09, 16, 15W, PSPNet, DeepLabv3, 3x3-Res50, train, val, 42.16, 80.36, 16, 15W, DeepLabV3, PSPNet, 3x3-Res101, train, val, 43.60, 81.30, 16, 15W, PSPNet, DeepLabv3, 3x3-Res101, train, val, 44.13, 81.42, 16, 15W, DeepLabV3, #### Object Detection
Pascal VOC2007/2012 (Single Scale Test): 20 Classes, Model, Backbone, Train, Test, mAP, BS, Epochs, Scripts, :--------, :---------, :------, :------, :------, :------, :------, :------, SSD300, VGG16, 07+12_trainval, 07_test, 0.786, 32, 235, SSD300, SSD512, VGG16, 07+12_trainval, 07_test, 0.808, 32, 235, SSD512, Faster R-CNN, VGG16, 07_trainval, 07_test, 0.706, 1, 15, Faster R-CNN, #### Pose Estimation
OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Instance Segmentation

Mask R-CNN

Generative Adversarial Networks

Pix2pix
CycleGAN

DataSets with TorchCV

TorchCV has defined the dataset format of all the tasks which you could check in the subdirs of data. Following is an example dataset directory trees for training semantic segmentation. You could preprocess the open datasets with the scripts in folder data/seg/preprocess

Dataset
    train
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...
    val
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...

Commands with TorchCV

Take PSPNet as an example. ("tag" could be any string, include an empty one.)

Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Resume Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Validate

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh val tag

Testing:

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh test tag

名稱與所有者	donnyyou/torchcv
主編程語言	Shell
編程語言	Makefile (語言數: 9)
平台
許可證	Apache License 2.0

創建於	2018-10-19 03:38:47
推送於	2020-11-19 05:40:57
最后一次提交	2020-11-19 13:40:56
發布數	0

星數	2.3k
關注者數	69
派生數	373
提交數	427
已啟用問題?
問題數	97
打開的問題數	42
拉請求數	70
打開的拉請求數	0
關閉的拉請求數	2

已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?

torchcv

Github星跟蹤圖