/CudaSift

A CUDA implementation of SIFT for NVidia GPUs

Primary LanguageCuda

This is the fourth version of a SIFT (Scale Invariant Feature Transform) implementation using CUDA for GPUs from NVidia. The first version is from 2007 and GPUs have evolved since then. This version is slightly more precise and considerably faster than the previous versions and has been optimized for Tesla K40 using larger images.

On a Tesla K40 GPU the code takes about 5.3 ms on a 1280x960 pixel image and 6.4 ms on a 1920x1080 pixel image, while the third version required respectively 11.2 ms and 14.5 ms. An additional 1.5 ms and 1.0 ms is needed for image transfers from CPU to GPU. There is also code for brute-force matching of features and homography computation that takes about 2.5 ms and 3 ms for two sets of around 1250 SIFT features each.

The code relies on CMake for compilation and OpenCV for image containers. OpenCV can however be quite easily changed to something else. The code can be relatively hard to read, given the way things have been parallelized for maximum speed.

The code is free to use for non-commercial applications. If you use the code for research, please refer to the following paper.

M. Björkman, N. Bergström and D. Kragic, "Detecting, segmenting and tracking unknown objects using multi-label MRF inference", CVIU, 118, pp. 111-127, January 2014.


Computational cost (in milliseconds) on different GPUs (latest benchmark marked with *):

                            1280x960  1920x1080   GFLOPS  Bandwidth  Matching
Maxwell  GeForce GTX 970       5.0*      6.5*      3494      224       1.6
Maxwell  GeForce GTX 750 Ti   10.6*     14.7*      1306       86       3.2
Kepler   Tesla K40             5.3*      6.4*      4291      288       2.5
Kepler   GeForce GTX TITAN     4.6*      5.8*      4500      288       2.5
Kepler   GeForce GTX 780       5.3       6.7       3977      288

Matching is done between two sets of 1050 and 1202 features respectively.