/ScatterFormer

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

Primary LanguagePythonApache License 2.0Apache-2.0

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

This repo is the official implementation of paper: ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention. It achieves state-of-the-art performance on large-scale Waymo Open Dataset with real-time inference speed.

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

Chenhang He*, Ruihuang Li, Guowen Zhang, Lei Zhang

Introduction

Window-based transformers have demonstrated strong ability in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner. However, because of the sparse nature of point clouds, the number of voxels per window varies significantly. Current methods partition the voxels in each window into multiple subsets of equal size, which cost expensive overhead in sorting and padding the voxels, making them run slower than sparse convolution based methods. In this paper, we present ScatterFormer, which, for the first time to our best knowledge, could directly perform attention on voxel sets with variable length. The key of ScatterFormer lies in the innovative Scatter Linear Attention (SLA) module, which leverages the linear attention mechanism to process in parallel all voxels scattered in different windows. Harnessing the hierarchical computation units of the GPU and matrix blocking algorithm, we reduce the latency of the proposed SLA module to less than 1 ms on moderate GPUs. Besides, we develop a cross-window interaction module to simultaneously enhance the local representation and allow the information flow across windows, eliminating the need for window shifting. Our proposed ScatterFormer demonstrates 73 mAP (L2) on the large-scale Waymo Open Dataset and 70.5 NDS on the NuScenes dataset, running at an outstanding detection rate of 28 FPS.

Main results

Waymo Open Dataset validation

Model #Sweeps mAP/H_L1 mAP/H_L2 Veh_L1 Veh_L2 Ped_L1 Ped_L2 Cyc_L1 Cyc_L2 Log
ScatterFormer (20%) 1 79.5/77.1 73.2/71.0 79.3/78.8 70.9/70.5 82.8/77.0 75.2/69.8 76.4/75.4 73.6/72.7 Log
ScatterFormer (20%) 4 79.5/77.1 73.2/71.0 79.3/78.8 70.9/70.5 82.8/77.0 75.2/69.8 76.4/75.4 73.6/72.7 Log
ScatterFormer (100%) 1 79.5/77.1 73.2/71.0 79.3/78.8 70.9/70.5 82.8/77.0 75.2/69.8 76.4/75.4 73.6/72.7 Log
ScatterFormer (100%) 4 79.5/77.1 73.2/71.0 79.3/78.8 70.9/70.5 82.8/77.0 75.2/69.8 76.4/75.4 73.6/72.7 Log

NuScenes validation

Model mAP NDS mATE mASE mAOE mAVE mAAE ckpt Log
ScatterFormer 66.4 71.1 27.0 24.8 27.2 22.6 18.9 ckpt Log

Usage

Installation

Please refer to INSTALL.md for installation.

Dataset Preparation

Please follow the instructions from OpenPCDet. We adopt the same data generation process.

Training

# multi-gpu training
cd tools
bash scripts/dist_train.sh 8 --cfg_file <CONFIG_FILE> [other optional arguments]

Testing

# multi-gpu testing
cd tools
bash scripts/dist_test.sh 8 --cfg_file <CONFIG_FILE> --ckpt <CHECKPOINT_FILE>