UVTR Unifying Voxel-based Representation with Transformer for 3D Object Detection Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia [ arXiv ] [ BibTeX ] This project provides an implementation for the NeurIPS 2022 paper " Unifying Voxel-based Representation with Transformer for 3D Object Detection " based on mmDetection3D . UVTR aims to unify multi-modality representations in the voxel space for accurate and robust single- or cross-modality 3D detection. Preparation This project is based on mmDetection3D , which can be constructed as follows. Install PyTorch v1.7.1 and mmDetection3D v0.17.3 following the instructions . Copy our project and related files to installed mmDetection3D: cp -r projects mmdetection3d/ cp -r extra_tools mmdetection3d/ Prepare the nuScenes dataset following the structure . Generate the unified data info and sampling database for nuScenes dataset: python3 extra_tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes_unified Training You can train the model following the instructions . You can find the pretrained models here if you want to train the model from scratch. For example, to launch UVTR training on multi GPUs, one should execute: cd /path/to/mmdetection3d bash extra_tools/dist_train.sh ${CFG_FILE} ${NUM_GPUS} or train with a single GPU: python3 extra_tools/train.py ${CFG_FILE} Evaluation You can evaluate the model following the instructions . For example, to launch UVTR evaluation with a pretrained checkpoint on multi GPUs, one should execute: bash extra_tools/dist_test.sh ${CFG_FILE} ${CKPT} ${NUM_GPUS} --eval=bbox or evaluate with a single GPU: python3 extra_tools/test.py ${CFG_FILE} ${CKPT} --eval=bbox nuScenes 3D Object Detection Results We provide results on nuScenes val set with pretrained models. NDS(%) mAP(%) mATE↓ mASE↓ mAOE↓ mAVE↓ mAAE↓ download Camera-based UVTR-C-R50-H5 40.1 31.3 0.810 0.281 0.486 0.793 0.187 GoogleDrive UVTR-C-R50-H11 41.8 33.3 0.795 0.276 0.452 0.761 0.196 GoogleDrive UVTR-C-R101 44.1 36.1 0.761 0.271 0.409 0.756 0.203 GoogleDrive UVTR-CS-R50 47.2 36.2 0.756 0.276 0.399 0.467 0.189 GoogleDrive UVTR-CS-R101 48.3 37.9 0.739 0.267 0.350 0.510 0.200 GoogleDrive UVTR-L2C-R101 45.0 37.2 0.735 0.269 0.397 0.761 0.193 GoogleDrive UVTR-L2CS3-R101 48.8 39.2 0.720 0.268 0.354 0.534 0.206 GoogleDrive LiDAR-based UVTR-L-V0075 67.6 60.8 0.335 0.257 0.303 0.206 0.183 GoogleDrive Multi-modality UVTR-M-V0075-R101 70.2 65.4 0.333 0.258 0.270 0.216 0.176 GoogleDrive Acknowledgement We would like to thank the authors of mmDetection3D and DETR3D for their open-source release. License UVTR is released under the Apache 2.0 license . Citing UVTR Consider cite UVTR in your publications if it helps your research. @inproceedings{li2022uvtr, title={Unifying Voxel-based Representation with Transformer for 3D Object Detection}, author={Li, Yanwei and Chen, Yilun and Qi, Xiaojuan and Li, Zeming and Sun, Jian and Jia, Jiaya}, booktitle={Advances in Neural Information Processing Systems}, year={2022} }