•  


GitHub - houqb/CoordAttention: Code for our CVPR2021 paper coordinate attention
Skip to content

houqb/CoordAttention

Folders and files

Name Name
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coordinate Attention for Efficient Mobile Network Design ( preprint )

This repository is a PyTorch implementation of our coordinate attention (will appear in CVPR2021).

Our coordinate attention can be easily plugged into any classic building blocks as a feature representation augmentation tool. Here ( pytorch-image-models ) is a code base that you might want to train a classification model on ImageNet.

Note that the results reported in the paper are based on regular training setting (200 training epochs, random crop, and cosine learning schedule) without using extra label smoothing, random augmentation, random erasing, mixup. For specific numbers in ImageNet classification, COCO object detection, and semantic segmentation, please refer to our paper .

Updates


Comparison to Squeeze-and-Excitation block and CBAM

diagram

(a) Squeeze-and-Excitation block (b) CBAM (C) Coordinate attention block

How to plug the proposed CA block in the inverted residual block and the sandglass block

wheretoplug

(a) MobileNetV2 (b) MobileNeXt

Numerical results on ImageNet

Method Params M-Adds Top 1 Model
MobileNetV2-1.0 3.5M 300M 72.3
MobileNetV2 + SE 3.89M 300M 73.5
MobileNetV2 + CBAM 3.89M 300M 73.6
MobileNetV2 + CA 3.95M 310M 74.3 link
Method Params M-Adds Top 1
MobileNeXt-1.0 3.5M 300M 74.0
MobileNeXt + SE 3.89M 300M 74.7
MobileNeXt + CA 4.09M 330M 75.2

Pretrained model (MobileNetV2 with CA) and model file are both available.

Some tips for designing lightweight attention blocks

  • SiLU activation (h_swish in the code) works better than ReLU6
  • Either horizontal or vertical direction attention performs the same to the SE attention
  • When applied to MobileNeXt, adding the attention block after the first depthwise 3x3 convolution works better
  • Note sure whether the results would be better if a softmax is applied between the horizontal and vertical features

Object detection

We use this repo (ssdlite-pytorch-mobilenext) .

Results on COCO val

Model Method Params AP
MobileNetV1 SSDLite320 5.1M 22.2
MobileNetV2 SSDLite320 4.3M 22.3
MobileNetV2 SSDLite320 5.0M 22.0
MnasNet-A1 SSDLite320 4.9M 23.0
MobileNeXt SSDLite320 4.4M 23.3
MobileNetV2 + SE SSDLite320 4.7M 23.7
MobileNetV2 + CBAM SSDLite320 4.7M 23.0
MobileNetV2 + CA SSDLite320 4.8M 24.5

Results on Pascal VOC 2007 test

Model Method Params mAP
MobileNetV2 SSDLite320 4.3M 71.7
MobileNetV2 + SE SSDLite320 4.7M 71.7
MobileNetV2 + CBAM SSDLite320 4.7M 71.7
MobileNetV2 + CA SSDLite320 4.8M 73.1

Semantic segmentation

We use this repo . You can also refer to mmsegmentation alternatively.

Results on Pascal VOC 2012 val

Model Method Stride Params mIoU
MobileNetV2 DeepLabV3 16 4.5 70.84
MobileNetV2 + SE DeepLabV3 16 4.9 71.69
MobileNetV2 + CBAM DeepLabV3 16 4.9 71.28
MobileNetV2 + CA DeepLabV3 16 5.0 73.32
MobileNetV2 DeepLabV3 8 4.5 71.82
MobileNetV2 + SE DeepLabV3 8 4.9 72.52
MobileNetV2 + CBAM DeepLabV3 8 4.9 71.67
MobileNetV2 + CA DeepLabV3 8 5.0 73.96

Results on Cityscapes val

Model Method Stride Params mIoU
MobileNetV2 DeepLabV3 8 4.5 71.4
MobileNetV2 + SE DeepLabV3 8 4.9 72.2
MobileNetV2 + CBAM DeepLabV3 8 4.9 71.4
MobileNetV2 + CA DeepLabV3 8 5.0 74.0

We observe that our coordinate attention yields larger improvement on semantic segmentation than ImageNet classification and object detection. We argue that this is because our coordinate attention is able to capture long-range dependencies with precise postional information, which is more beneficial to vision tasks with dense predictions, such as semantic segmentation.

Citation

You may want to cite:

@inproceedings{hou2021coordinate,
  title={Coordinate Attention for Efficient Mobile Network Design},
  author={Hou, Qibin and Zhou, Daquan and Feng, Jiashi},
  booktitle={CVPR},
  year={2021}
}

@inproceedings{sandler2018mobilenetv2,
  title={Mobilenetv2: Inverted residuals and linear bottlenecks},
  author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={4510--4520},
  year={2018}
}

@inproceedings{zhou2020rethinking,
  title={Rethinking bottleneck structure for efficient mobile network design},
  author={Zhou, Daquan and Hou, Qibin and Chen, Yunpeng and Feng, Jiashi and Yan, Shuicheng}
  booktitle={ECCV},
  year={2020}
}

@inproceedings{hu2018squeeze,
  title={Squeeze-and-excitation networks},
  author={Hu, Jie and Shen, Li and Sun, Gang},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={7132--7141},
  year={2018}
}

@inproceedings{woo2018cbam,
  title={Cbam: Convolutional block attention module},
  author={Woo, Sanghyun and Park, Jongchan and Lee, Joon-Young and Kweon, In So},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  pages={3--19},
  year={2018}
}
- "漢字路" 한글한자자동변환 서비스는 교육부 고전문헌국역지원사업의 지원으로 구축되었습니다.
- "漢字路" 한글한자자동변환 서비스는 전통문화연구회 "울산대학교한국어처리연구실 옥철영(IT융합전공)교수팀"에서 개발한 한글한자자동변환기를 바탕하여 지속적으로 공동 연구 개발하고 있는 서비스입니다.
- 현재 고유명사(인명, 지명등)을 비롯한 여러 변환오류가 있으며 이를 해결하고자 많은 연구 개발을 진행하고자 하고 있습니다. 이를 인지하시고 다른 곳에서 인용시 한자 변환 결과를 한번 더 검토하시고 사용해 주시기 바랍니다.
- 변환오류 및 건의,문의사항은 juntong@juntong.or.kr로 메일로 보내주시면 감사하겠습니다. .
Copyright ⓒ 2020 By '전통문화연구회(傳統文化硏究會)' All Rights reserved.
 한국   대만   중국   일본