Chapter 8 - Variational Inverse Problems
Table of contents
- Part 1: Deconvolution, Fourier-based approaches
- Part 2: Algebraic approaches and regularization
- Part 3: Variational inverse problems
- Part 4: Outlook: some advanced variational models
Part 1 - Deconvolution: approach based on the Fourier-transform
Slides 1-23
Among the most common forms of image degradation is blur and noise. Both arise in a way that is fundamentally not avoidable in optical systems, sometimes as a desired effect of a photographer, but often something you would prefer to get rid of again. Deconvolution in the presence of noise is therefore one of the most fundamental and most well-researched problems in computer vision. Interestingly, it turns out that for the most common choice of Gaussian kernel, the problem of pure deconvolution is mathematically exacly solvable in the Fourier domain. However, in the presence of even small quantities of noise, the mathematically exact solution breaks down completely, since taking the necessary reciprocal values of the kernel is numerically unstable due to the values being too small. In addition, the filter which inverts convolution is a high-pass filter, which substantially amplifies all sources of noise. Thus, this approach is basically useless in practice, although improvements like truncating the reciprocals or adjusting them based on the noise level like in Wiener deconvolution tends to improve the results a bit.
Index
00:00 Introduction, deconvolution as a prototype linear inverse problem
03:20 Sources of blur in optical systems: camera shake/motion, aperture, diffraction, atmospheric turbulence ...
12:20 Mathematical formulation of the deconvolution problem and the general inverse problem
18:00 Exact solution of deconvolution in the Fourier domain
22:05 Experiments on real images with different amounts of noise added to the observation
27:30 An analysis of a toy example: why does the approach fail so spectacularly?
31:40 First attempt at a remedy: clamping the reciprocal values of the kernel
36:30 Experiments: better, but still very noisy results since the reconstruction filter is a high-pass filter
40:00 Best attempt at a remedy: Wiener deconvolution with weight based on the signal-to-noise ratio
45:50 Discussion, outlook on further improvements
Part 2 - Algebraic approaches and regularization
Slides 24-46
As a second approach to solving deconvolution, we take the algebraic approach of trying to invert the resulting linear system of equations. This is also theoretically possible, since convolution with a Gaussian is an invertible operation. However, it fails in a similarly spectacular way as the Fourier-based approach. The reason is the ill-posedness of the problem, which manifests itself in the singular values of the system matrix. Indeed, from the SVD you can see that what you basically do when inverting a matrix is to take reciprocals of the diagonal elements of the diagonal matrix in the decomposition, and this is also numerically unstable here since the values become too small. In effect, one has a so-called numerical null space, and changing the parameter values within this space numerically leads to the same predicted image. Thus, inverting the process strongly amplifies noise again. A first remedy is implicit regularization using a truncated version of SVD, which however is infeasible due to high computation time and conceptionally not as nice, as it is a hack in the algorithm and not a well-founded adjustment of the model. Thus, we prefer explicit regularization, where the space of solutions is constrained using an additional term in the energy, which models a prior assumption on the type of solutions we expect. We start with Tikhonov regularization as a simple example, and analyze the behaviour on a toy problem. We also think about ways how to strike an optimal balance between regularization and fidelity to the data.
Index
00:00 Introduction, directly solving the linear system of equations for deconvolution
04:07 The gradient of the least-squares energy and the normal equations
07:25 Derivation of the directional derivative
14:00 Derivation of the gradient: the role of the transpose matrix to liberate $h$
18:55 Application to the deconvolution problem: analysis of the complete failure via SVD
27:35 Idea of the pseudo-inverse: inversion of the diagonal matrix "where it is possible"
28:50 Inversion fails numerically if Eigenvalues become too small
32:10 Numerical null space and consequences for the solution, ill-posedness and the condition number
37:25 The truncated SVD as form of regularization and a proposed solution to the problem
41:40 Discussion, in particular on runtime, makes this approach infeasible in practice
44:15 A more efficient and well-founded approach: explicit regularization
48:44 Most simple form: Tikhonov regularization
50:50 Interactively exploring dependency on the parameter $\lambda$
55:20 Semi-automatic parameter selection: the L-curve criterion as a heuristic
60:55 Summary and outlook
Remarks:
50:10 Here, one can see that all the Eigenvalues of $A^T A$ are actually precisely increased by the value $\lambda$. The reason is that $A^T A$ is symmetric and positive definite, so you can write it as $V \Lambda V^T$ with an orthogonal matrix $V$ and a diagonal matrix $\Lambda$, which contains the Eigenvalues. Substituting $\lambda I=V (\lambda I) V^T$ in the normal equations and rearranging a bit leaves us with the claim.
Part 3a - Variational Inverse Problems: energy minimization problem and the adjoint operator
Slides 47-56
We generalize the finite dimensional linear inverse problem to a variational problem. The energy thus takes a similar form as the one from Horn and Schunck, but it has a new distinct term, which is a linear map or operator applied to a function. More precisely, we already had the gradient of a function as a linear operator, which is thus a special case of more general scenario we have now. Dealing with the new complication requires the introduction of a new object: the so-called adjoint of an operator, which is a generalization of the transpose of a matrix. We consider three different examples: first the well-known finite-dimensional setting as a warm-up, second the adjoint of convolution, which we will need further on. Finally, the adjoint of the gradient operator which we have seen before and which thus shows how this supplements the Euler-Lagrange framework and gives a new understanding of the divergence theorem.
Index
00:00 Introduction, the linear inverse problem as a variational problem
07:58 Preliminary remarks on the the functional gradient of the variational energy
12:50 Derivation of the functional gradient, reminder and comparison to the discrete setting
18:10 The need of generalizing the transpose of a matrix to the continuous setting
20:25 The actual generalization: the adjoint operator
27:05 First example: finite-dimensional space, the adjoint is the same as the transpose
32:28 Second example: the adjoint of convolution is correlation, or convolution with the flipped kernel
36:05 Illustration of what you need to show in order to prove this claim
40:05 Third example: the adjoint of the gradient operator
43:05 A sketch of how to derive this, reminder and reinterpretation of the divergence theorem
Part 3b - Variational Inverse Problems: the functional gradient, gradient descent and discretization
Slides 57-72
We show how to use the adjoint operator to compute functional gradients for the energy of a linear inverse problem with regularization based on the squared gradient norm. Applied to the deconvolution model, this immediately leads to the functional gradient which be need, so we could now go ahead and solve the Euler-Lagrange equation using techniques we learned before. However, since the problem is convex and we now know the gradient, we apply a simple direct minimization approach, which is gradient descent. The solutions we obtain are far superior to the ones we have seen before.
Index
00:00 Theorem: functional gradient for variational inverse problems
02:40 Examples: finite-dimensional setting, deconvolution, gradient-squared regularizer
07:10 Writing the Laplace operator as the divergence of the gradient
08:50 Application to the deconvolution model
12:30 Some example results: works very well compared to our previous attempts
14:40 Simple approach to compute a solution in practice: gradient descent
19:10 Notes on solvability: convex vs. non-convex problems
22:15 Some further properties and considerations for the gradient descent technique
23:40 Final algorithm to solve the deconvolution problem
25:00 Implmenetation notes: how to discretize gradient and divergence to satisfy the divergence theorem
32:00 Summary and outlook
Part 4 - Outlook: Advanced Variational Models
Slides 73-100
We consider several advanced variational models for different image analysis tasks. We start with super-resolution, where the goal is to increase resolution beyond what is given in an input image. Obviously, this requires gathering the missing information from somewhere, and since we do not apply learning in this lecture, we obtain it from multiple views of the scene. These are warped into a reference image using optical flow, where the information is aggregated via multi-frame deconvolution. We generalize the problem of super-resolution to reconstruction of texture on a mesh, where the multi-frame deconvolution model is solved on a curved surface instead of the image plane. Second, we consider blind deconvolution, where the kernel is unknown. Using sufficiently strong priors, one can solve this problem by alternating optimization for kernel and image. Finally, we take a look at inpainting. Here, methods which are not based on learning typically fail and can only recover cartoonish images, as the missing details must be inferred from the boundaries of the damaged domains.
Note: this part is optional content for everyone who is interested and not required watching for the exam.
Index
00:00 Introduction, the problem of super-resolution
03:20 Obtaining information for super-resolution from multiple images
09:40 Observation model for a single frame: motion, blur, down-sampling and noise
13:30 Assembling this into a multi-frame model
15:15 Details on the warping operator, implementation
21:00 Example results, $L^1$ vs. $L^2$-norm regularization
22:50 Application in computational photography: light field cameras
27:00 Generalization to texture map reconstruction on surfaces
37:00 Blind deconvolution model: both kernel and image are unknown
39:30 Paper with very good results which employs a more realistic image prior
46:50 Example results for images and kernel and the optimization process
48:20 Remarks on the special cases of image zooming (one image) and denoising (no blur)
51:20 Final problem: image inpainting
57:10 Summary and wrapup