Chapter 2 - Image Features
This chapter is intended to be watched within two weeks, however, it is overall pretty long. The reason is that in here, I give detailed reminders about the mathematical background you should have to get the most out of the lecture. There are probably many things you are already familiar with, so you can quickly skip over it or set a higher playback speed on YouTube. I just want to make sure that all are on the same level. Feel free to focus on the sections required for the exercises for now, and study the more optional background information later on, since the next two chapters are pretty relaxed. However, make sure you have an understanding of how to apply our two most important tools, which are PCA and SVD, and that you have understood how the structure tensor is constructed in at least one way (statistically or analytically).
Table of contents
- Part 1: Image derivatives
- Part 2: Image edges, scale space, idea of features
- Part 3: The singular value decomposition, split in two due to length
- Part 4: The principal component analysis
- Part 5: The structure tensor of an image
- Part 6: Pattern matching and autocorrelation
Part 1 - Image derivatives
Slides 1-17
We discuss the basics of edge detection, and how edge detection at a low level, based on the gradient of the image function, differs qualitatively from high-level edge detection such as performed by the human visual system. As mathematical background, we require derivatives on two-dimensional space, and we introduce the gradient as the vector of the two partial derivatives, which are essentially 1D derivatives in the direction of the coordinate axes. To compute a discrete version of the gradient on a pixel grid, we can use convolution operations with special filters. When applying these to noisy images, we increase noise, so it is a good idea to pre-filter the image. Equivalently, we can pre-filter the kernel to obtain the "Derivative of Gaussian (DoG)" filter, which allows a slightly more efficient implementation.
Index
00:00 Introduction, overview of the chapter
01:25 What are image edges? "Human" vs. "Technical" point of view
05:48 Characterization of edges in the image function
08:00 Mathematical background: partial derivatives
17:05 Mathematical background: gradient and gradient magnitude
21:45 Gradient-based edge detection
28:00 Mathematical background: difference quotient and discretization of derivatives
40:50 Discrete derivatives and convolutions
46:10 Example: visualization of partial derivatives and gradient magnitude of an image
48:48 Derivatives in the presence of noise, the idea of pre-smoothing
53:08 Increasing efficiency, the discrete "Derivative of Gaussian (DoG)" filter
Notes and errata
19:33 I meant "the gradient is the direction of steepest ascent", not descent, i.e. the direction in which the intensity function increases as fast as possible.
35:48 Variant 2 is in a sense actually a special case of variant 1, where $f$ is interpolated with a line.
27:20 Note: for visualization of the partial derivatives, average gray means zero, darker values are below zero, brighter ones above.
Part 2 - Image edges, scale space, idea of features
Slides 18-49
We first take a little detour into material which becomes only important in a later chapter, and look at the continuous version of convolution defined for functions (instead of pixel grids). The discussion fits here, as we have talked about discretization of derivatives, and now introduce discretization of integrals. We show that the discretization of continuous convolution exactly resembles the previous discrete definition.
After this detour, we return to the derivative of Gaussian filters, which we can now also properly define in a continuous setting with continuous derivatives. Based upon these filters, we start investigating the idea of scale space with introducing the Canny edge detector, which performs edge detection based on the DoG gradient magnitude at a certain $\sigma$. We learn a bit about the behaviour of edges when we increase $\sigma$, and discuss the axioms of scale space theory.
We start the next section, and introduce the task of feature detection and matching, which gives us a roadmap for future chapters. In particular, we talk about requirements on good feature detectors, and the multitude of applications.
Index
00:00 Foundations for later chapters: continuous convolution
02:10 Mathematical background: Riemann integral and its discretization in 1D
09:00 Mathematical background: integrals over infinite and multi-dimensional domains
16:00 Mathematical background: continuous convolution and its discretization
22:25 Mathematical background: interplay of derivative and convolution in the continuous setting
29:39 Back to the Derivatives of Gaussian filter: a look at it in the continuous setting
35:50 The Canny edge detector
- on the origins of the "Lena" test image
- gradient magnitude using DoG filters
- non-maximum suppression
- edge linking and thresholding
46:00 Edge detection and the scale parameter $\sigma$
47:12 Relation of Gaussian filtering to heat diffusion, scale space theory
51:36 Fundamentals of feature detection
55:20 Outlook: applications of feature matching, goals of a detector
62:45 Preview and motivation of the next parts of the chapter
Part 3A - The singular value decomposition (SVD), application to a linear map $A$
Slides 50-52
In part 3, we understand what the singular value decomposition of a matrix means. Our first interpretation will be geometric, in the sense that we investigate the implications if we consider the matrix $A$ as a linear map. In the course of this, we will give reminders of basic concepts of linear algebra which will be important to know about here and in later chapters. Essentially, what we will find is that the SVD implies that any matrix can be written as a sequence of three maps: a rotation, a scaling operation, and a rotation again. We note some immediate consequences regarding the rank of the matrix $A$ and the products $AA^T$ and $A^T A$.
Index
00:00 A tour of the SVD theorem
06:35 Understanding matrices as linear maps
09:40 Images of unit vectors, inner product and orthogonality
21:00 Geometric interpretation of the linear map $A$
25:57 Understanding orthogonal transformations
33:00 Geometric interpretation of orthogonal maps
36:06 Understanding how diagonal matrices transform vectors
40:05 Putting it all together: geometric meaning of the SVD
49:42 Immediate implications, the rank of the matrix $A$
56:42 Another implication: relationship to Eigenvalue decomposition of $A^T A$ or $AA^T$.
Part 3B - The singular value decomposition (SVD), application to a dataset $A$
Slides 53-59
We look at a second interpretation of the SVD, which implies an approximation of $A$ in terms of components of rank $1$ with decreasing importance. We can use this different way to look at the decomposition for dimensionality reduction of data, and to find the most characteristic features of each vector in the dataset. As an extensive example in image analysis, we use SVD to analyse a dataset of faces to be able to distinguish persons.
Index
00:00 Reinterpretation of the decomposition as an approximation of $A$
05:30 Proof of the approximation formula, finally an analysis of the cases $m>n$ and $m<n$
18:17 Example: approximation of a dataset of faces, written in the rows of $A$
22:58 The "Eigenfaces", columns of $V$ in the decomposition of $A$
24:30 The meaning of the singular values: "importance" of the respective component
25:30 Approximation of a face in terms of Eigenfaces, computing coefficients with respect to a basis
32:30 First coefficients in the expansion capture characteristics of a person
35:30 Summary and Outlook
Part 4 - The principal component analysis (PCA)
Slides 60-67
We review the central definitions of statistics for multi-variate distributions, i.e. probability distributions which depend on several variables. It turns out that the covariance matrix is exactly $M^T M$, where $M$ is the matrix of measurements. The SVD of $M$ thus yields a transformation in measurement space such the the new covariance matrix is diagonal. This means that in the rotated distribution, all variables are decorrelated and ordered by decreasing variance. This process is called the principal component analysis (PCA) of a distribution.
We show how to compute standard deviations in arbitrary direction, and describe a way to visualize the PCA using the ellipsoids shown in the diagrams.
Index
00:00 Introduction, probability distributions with several variables
03:50 Joint and marginal distributions
07:55 Independence of random variables
14:40 Visual preview of PCA: making the variables independent
17:50 Sample expectaion, variance and standard deviation
21:35 Covariance and centering of measurements
30:00 Computing the covariace matrix from the matrix of measurements
38:39 Properties of the covariance matrix, diagonalization using the SVD
48:00 Visual interpretation in the diagram
50:55 Standard deviation in arbitrary directions
56:30 Summary and outlook
Part 5 - The structure tensor of an image
Slides 68-78
We introduce the structure tensor of an image as the covariance matrix of the gradient at a certain scale, computed in every pixel. By analyzing Eigenvalues, we can obtain a classification of the local image structure into flat regions, edges and corners. We introduce two corner detectors, fasts tests for whether we have a corner at a certain pixel. Finally, we take a look at how the corner detector performs on a practical example.
Index
00:00 Reminder: idea of corner detection, gradient statistics
02:18 Reminder and notation: Gaussian filters, scale derivatives (DoG)
04:35 Collecting statistics of the gradient in every pixel using convolution
12:39 Definition of the structure tensor
15:22 Analyzing local image structure with the structure tensor
19:40 Classification into flat regions, edges and corners based on Eigenvalues
22:50 Corner detectors (Shi/Tomasi and Harris/Foerstner)
28:40 Examples for Harris/Foerstner corner detection in practice
31:50 Outlook
Notes
11:30 To clarify, the correct version of the formula without taking shortcuts would be $$\sum_{k,l\in\mathbb{Z}} p_{k,l} (f_x^\sigma f_y^\sigma)(i-k, j-l),$$ where the sum is of course finite in practice.
26:30 Remark: the reason for this is that a transformation of the form $VAV^T$ with an orthogonal matrix $V$ changes neither the determinant nor the trace of $A$.
Part 6 - Pattern matching and autocorrelation
Slides 79-105
We give an overview over simple pattern matching metrics, the sum of squared differences and normalized cross-correlation, and compare some of their properties. Computing SSD of a patch at a slightly shifted location within the same image yields the autocorrelation function. We compute a linear approximation of the autocorrelation using first order Taylor expansion, and obtain a quadratic form. The matrix of this quadratic form is the structure tensor again. We note that the maximum and minimum of a quadratic form over unit vectors is attained for the largest and smallest Eigenvector, respectively, so SVD can be used to solve problems of this type. Applying this to autocorrelation, we obtain the same results about patch stability as before, but from a different perspective. We summarize with an interesting comparison of this analytic approach to corner detection vs. the previous statistical method.
Index
00:00 Introduction, comparison of patches in different images using SSD
10:00 Comparing image patches using the normalized cross-correlation
21:34 Definition of the autocorrelation function
27:10 Interpretation of the function on an example image
32:40 Mathematical background: first-order Taylor expansion in multiple dimensions
42:14 Application to the linearization of the autocorrelation function
45:20 Quadratic forms, the structure tensor reappears
48:50 Directions which maximize and minimize autocorrelation
54:45 Interpretation of the results, comparison PCA vs. autocorrelation maximum/minimum
57:30 Short summary of this long chapter, outlook