Chapter 6 - SIFT

In this chapter, we introduce the scale invariant feature transform (SIFT), a landmark method in computer vision in order to detect matching features in images. From the ground up, it is designed to be invariant to translation, scale, rotations in the image plane, as well as brightness and contrast changes. The method is very well engineered and works out of the box for a multitude of applications. For this reason, this is one of the most frequently cited papers in computer vision, and the culmination of decades of research into this topic by many different people.

Table of contents

Part 1 - Introduction

Slides 1-10

We start with an overview of the scale invariant feature transform (SIFT) and its properties and applications, and discuss its importance in computer vision. The SIFT framework consists of four steps, which will be presented in the upcoming parts.


Index

00:00 Overview of the scale invariant feature transform (SIFT)
04:40 Optional: the SIFT paper in relation to other publications
11:43 Engineering highlights and applications of SIFT
15:25 The four steps of SIFT, roadmap for the chapter
19:30 Outlook


Part 2 - SIFT feature detection

Slides 11-27

SIFT features are blobs at different sizes. The detector for blobs which has a normalized response over all scales is the normalized Laplacian of Gaussians. Whe show that this filter is approximately the same as a difference of Gaussians which we have in our scale space pyramid, which means that this is what we compute in SIFT. We look at different examples for illustration, and also investigate the relationship of heat diffusion to Gaussian filtering in an optional section.


Index

00:00 Corners vs. SIFT features, Difference of Gaussian scale space employed in SIFT
04:30 Gaussian scale space pyramid in SIFT, the octaves
07:30 The Difference of Gaussians approximates the normalized Laplacian of a Gaussian (LoG)
12:50 "Proof by Mathematica" for the previous claim
17:00 Summary: a new interpretation of the DoG filter and the Laplacian pyramid
20:00 The Laplacian as a detector for blobs of different sizes
25:27 Example scale space, detour (optional): relation of Gaussian filtering to heat diffusion
37:00 Example: normalized Laplacian of Gaussian and detected blobs in sunflower image
41:50 Notes regarding feature detection with the normalized LoG
43:40 Extrema in scale space: non-maximum suppression in three dimensions


Part 3 - Accurate localization of keypoints

Slides 28-33

In step two of the SIFT pipeline, we refine the location of feature detections in scale space using a second order approximation of the DoG detector function. To do this, we first develop the necessary mathematical background, which you can of course skip if you are already familiar with it. After refining feature locations, we discard detections which are too weak and those which are unstable, in the sense that they are only well localized in one direction. This can be analyzed by looking at the curvature in different directions, and we show how this can be computed using a quadratic form based on the Hessian matrix. The Eigenvectors of the Hessian thus give the directions of largest and smallest curvature, respectively, and are called the principal curvatures. Their ratio determines whether the feature is stable.


Index

00:00 Summary of current results, goals for refining feature location accuracy
05:00 Mathematical background: quadratic approximation of a 1D function
13:00 Mathematical background: second order Taylor expansion in nD, the Hessian matrix
22:10 Sub-pixel refinement of feature locations in scale space using 2nd order Taylor expansion
30:00 Elimination of weak and unstable detections
32:15 Mathematical background: geometric interpretation of the Hessian
41:55 Application: elimination of unstable detections based on principal curvature


Part 4 - Orientation assignment, SIFT descriptor and object detection

Slides 34-55

The next step of SIFT is assignment of a dominant orientation. For this, we sample gradient values in the feature region into a histogram, weighted by gradient magnitude and distance to the location. The maxima of the histogram give candidate orientations after refinement using quadratic interpolation, as for the location. Once dominant orientation is known, we can finally in the last step construct a rotation and scale invariant descriptor by rotation of the patch into a default orientation. The SIFT descriptor then consists of histograms of gradients in 16 blocks, each with 8 histogram bins, so 128 values in total. We consider the rest of the invariance properties (photometric, out-of-plane orientation) and look at object detection as an application example from the paper.


Index

00:00 Overview, histogram of gradients (HoG) for dominant orientation detection
06:05 Dominant orientation assignment and refinement, feature cloning
09:45 Example visualizations of scale-space detections and orientations
11:20 The SIFT descriptor: orientation and scale normalization
14:15 The SIFT descriptor: histograms of gradients in different blocks
18:25 Distribution of gradient sample to blocks and histogram bins
23:23 Final SIFT descriptor from concatenating the HoGs and normalizing
25:10 Photometric invariance properties, contrast and brightness invariance
27:15 Example: features and visualization of the descriptor
28:30 Summary of SIFT properties
30:50 Example application: object detection
32:30 Summary of the chapter and outlook