16:332:579 ADVANCED TOPICS IN COMPUTER ENGINEERING:
Computer Vision
Fall 2009. Index No. 30782
Description
The goal of the course is to provide a state-of-the-art
overview to some of the recent computer vision methods,
starting from the cameras and ending with motion interpolation.
In general, the latest algorithms will be discussed,
but this course will not present anything done in our laboratory,
even if this sometimes gives better results.
Those results will be presented in 16:332:570 in the Spring 2010.
The course is organized like a seminar. The material is
presented from different sources and
papers will be assigned almost every week. The code for some of the
papers will be also available from the web and
good knowledge of MATLAB (at least) is required.
Previous exposure to computer vision, for example, the course
16:332:561 or equivalent, is needed in order to take
the most from this course.
Schedule
Tuesday 3:20--6:20 pm, CoRE 538.
Instructor
Peter Meer      
CoRE 519. Ph: (732) 445-5243. E-mail: meer@caip.rutgers.edu
Office hours: During the day of the lecture, or by appointment.
Textbook
R. I. Hartley and A. Zisserman. Multiple View Geometry in
Computer Vision. Cambridge University Press, 2000 or the second
edition, 2004.   Notation:HZSec.x.x. The references
from the second edition, 2Sec.x.x, are shown in parenthesis.
This book will be used a lot. If you do graduate work in
a related area you probably should have it. The second book is not a
textbook.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery.
Numerical Recipes in C. Cambridge University Press,
second edition, 1992.   Notation:NRSec.x.x.
Is a wonderful collection of programs in C, which we will use it in
a MATLAB version. You laboratory probably have a copy of it.
Read for the Lectures
Tentative Outline. All the slides, more details, will come as we go
along.
Lecture 1.
Projective geometry. Cameras. Different projections.
- The slides
of this and the following lecture. You need to know the
matrix inversion in block
form too.
- Ref: HZChap.1 without Sec.1.6 (2Chap.2),
projective geometry in 2D;
      HZSec.7.6 (2Sec.8.6), vanishing points, line;
      HZSec.2.1, Sec.2.2 but not from the Plücker
matrices on, Sec.2.4 to Sec.2.7 (2Sec.3.x), the 3D representations;
      HZChap.5 without Sec.5.3.6 and Sec.5.4 (2Chap.6),
camera models;       HZSec.A3.2.1 (2Sec.A4.2.1) and
NRSec2.9, Cholesky factorization.
-
Lecture 2.
Continuation of lecture 1.
-
- Homework.
We will try to implement HZSec.1.7.2. (2Sec.1.7.2),
recovery of affine properties from images; and
the projective/affine to metric rectification, HZSec.1.7.5
(2Sec.2.7.5), recovery of metric properties from images.
You can do it with the example
in the book, Fig.1.6c (2Fig.2.6c) or take a textured planar scene,
from the web or your photo. You will have to measure a few lines,
and find orthogonal lines in the world.
You can use Cholesky factorization (which will be also covered in
lecture 4) for the affine to metric rectification.
Apply the procedure to a non-planar image too and rectify the two planes,
one after the other. Assume that the windows are square.
-
Lecture 3.
Estimation of computer vision problems.
- The slides
of this lecture.
- Ref: NRSec.2.8 and Sec.3.1, polynomial interpolation
(the Vandermonde system, also good on the web, Wikipedia "Vandermonde
matrix"); other polynomial bases are in NRSec5.8, Sec.4.5,
Bernstein polynomial is in Wikipedia, but we will not cover them;
      NRSec.3.3 cubic spline interpolation;
      NRSec.15.1, least squares as maximum likelihood
estimator; HZSec.A3.3 and Sec.A3.4 (2Sec.A4.3, 2Sec.A4.4, 2Sec.A5.1,
2Sec.A5.2), but specific least squares appendices
which will be covered later;
      Wikipedia's "Lagrange multipliers" is a good start;
      NRSec10.1 and Sec.10.2, bisection methods in 1D;
      NRSec.10.4, downhill simplex;
      NRSec.9.6 or HZSec.A4.1 (2Sec.A6.1), Newton's method;
      HZSec.A4.2 and Sec.A4.3 (2Sec.A6.2, 2Sec.A6.3,
2Sec.A6.6, 2Sec.A6.7) and NRSec.15.5, Levenberg-Marquardt iterations;
      A pattern recognition/machine learning techinque,
principal components analysis, in any books which contains the PCA
too.
- A C/C++ nonlinear least squares
based on the Levenberg-Marquardt algorithm is in Manolis Lourakis
webpage. At the beginning of that page you can also move to a
Generic Sparse Bundle Adjustment Package.
- Homework. A Levenberg-Marquardt routine description is
at this location. Compare it with the text in
HZSec.A4.2/3 (2Sec.A6.2/3) and comment why the sparse L-M can be used
in a lot of occasions in computer vision. If you want to read more
about L-M and some other non-linear least squares methods, the booklet by K. Madsen, H.B. Nielsen, O. Tingleff
from Technical University of Denmark, is very good. Can be also
reached from Lourakis webpage.
-
Lecture 4.
Camera calibration. A few methods.
- The slides
of this lecture.
- Ref: HZChap.6 (2Chap.7), camera calibration;
      HZSec.7.5-7.7
(2Sec.8.5, 2Sec.8.6, 2Sec.8.8-2Sec.8.10),
calibration with absolute conic or vanishing points and lines;
      HZSec.A.3.1.1 (2Sec.A4.1.1),
Givens rotations and RQ decomposition.
- Z. Zhang.
Camera Calibration.
in G. Medioni and S.B. Kang, eds., Emerging Topics in Computer Vision,
Chapter 2, pages 4-43, Prentice Hall Professional Technical Reference,
2004.
- Different calibration programs are available
at this location.
- Homework.
Calibrate your camera (or a computer based one) using
the 2D plane-based technique. Zhang's paper above has all the details.
You starts the calibration programs with the checkboard pattern
in "Doing your own calibration" item. Extract the corners automatically
and also correct for image distortion. The distortion coefficient
here is not identical with the distortion coefficients used in
the main calibration routine.
The example in "First calibration example - Corner
extraction, calibration, additional tools", which can be also reached
from the item "Calibration examples", has all the steps. Take at
least 10-15 different positions for the checkboard pattern.
After the first image with distortions, for the following images
you don't have to use an other
distortion coefficient since this will be taken
care in the calibration part. The routine has a linear
initialization (without lens distortion) followed by an iterative
gradient descent optimization. After an iteration, you project back
the estimator to the calibration patterns and, if necessary, do an
another non-linear iteration. After a few (less than ten) iterations
the solution converges. You should have the camera with skew=0
and the sixth order radial distortion not taken into account.
Verify that is really true. Normally it should be.
Analyze the errors and see if in one of the checkboard
images how close a few of the reprojected points are relative
to the original (starting) points. Check the linearity with some
objects projected from 3D.
Note: I don't think that there is a need to have
the processing redone with different window sizes for different images.
The homework should contain only the programs written by you.
-
Lecture 5.
Interest points in 2D images.
- The slides
of this lecture.
- Ref: The currently most popular approaches are given
below.
- D. G. Lowe.
Distinctive image features from scale-invariant keypoints.
International Journal of Computer Vision,
vol. 60, no. 2, pp. 91-110, 2004.
- K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman,
J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool.
A comparison of affine region detectors.
International Journal of Computer Vision, vol. 65, no. 1/2,
pp. 43-72, 2005
- S. Baker and I. Matthews.
Lucas-Kanade 20 years on:
A unifying framework. International Journal of Computer Vision,
vol. 56, no. 3, pp. 221-255, 2004. Section 4 in the paper is an
excellent description of the various gradient descent appoximations:
Gauss-Newton, Newton, diagonal Hessian, Levenberg-Marquardt and
steepest-descent.
- An implementation of SIFT (in C++ or MATLAB) can be taken from
the site of Andrea Vedaldi. It is similar with David Lowe's
implementation.
- The implementation of interest points of affine invariant
features is in
this location at Oxford.
- An implementations of the Kanade-Lucas-Tomasi feature tracker
(in C) is at the site of Stan Birchfield, while
the MATLAB programs based on Lucas-Kanade 20 years on, is at
a CMU site.
- Homework. The 2D images are two-dimensional renditions of
three-dimensional objects. No interest point can be completely
invariant if, for example, the illumination changes drastically.
See, for example, the figure.
Can you find in the papers above limitations of the methods. Try,
at least theoretically, to apply the different methods to this figure
and/or to some real tough images from the web.
-
Lecture 6.
RANSAC and some other robust fittings.
- The slides
of this lecture.
- Ref:HZSec.3.7 (2Sec.4.7, 2Sec.A6.8), RANSAC.
- O. Chum and J. Matas.
Matching with
PROSAC - Progressive sample consensus.
Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
vol. 1, pp. 220-226, San Diego, CA, June 2005.
- A general implementation of RANSAC, by Peter Kovesi, affiliated
with University of Western Australia, can
be taken from here..
- Homework. Explain through some simple theoretical
examples why the above paper will not apply in the general case
and/or when there are several stuctures to estimate in a higher
(minimum four) dimension and a lot of outliers.
Fourier transform, Gabor filters. Textons.
- The slides
of this lecture.
- Ref: The following two papers give you a good
introduction to textons.
- M. Varma and A. Zisserman.
A statistical approach to texture classification from single images.
International Journal of Computer Vision, vol. 62, no. 1-2,
pp. 61-81, 2005.
- M. Varma and A. Zisserman.
A statistical approach to material classification
using image patch exemplars.
IEEE Trans. Pattern Anal. Machine Intell., vol. 31, no. 11,
pp. 2032-2047, 2009.
- Implementation of texture classification can be found in
this location at Oxford.
- Homework.
What happens if the texture appears on a projective plane in the
image? The following two images,
first image
and the second image
have texture with a strong tilt.
Can we still use texton like classification or we have to
define a shape-from-texture type method. In the real world probably
a shape-from-texture method is more useful, which needs the projective
transformation to be known and the projection of the texture on
(possible curved) surface. We will not do it because it needs
a lot of background material, e.g, illumination, photogrammetry,
orientation from the phase of the planar texture etc. The paper,
J. Malik and R. Rosenholtz.
Computing local surface orientation and shape from texture
for curved surfaces. International Journal of Computer Vision,
vol. 23, no. 2, pp. 149-168, 1997, has good results;
while the slides (needing a lot of electric engineering stuff)
for a texture recovery from phase are at
this location.
Texture synthesis.
- The slides
of this lecture.
- Ref: The following two papers give a computer vision type
approach. Other methods also exist, mainly in computer graphics.
- A. Efros and T. K. Leung.
Texture synthesis by non-parametric sampling.
Proc. 7th International Conference on Computer Vision (ICCV),
pp.1033-1039, Corfu, Greece, September 1999.
- A. Efros and W. T. Freeman.
Image quilting for texture synthesis and transfer.
Proc. of the 28th Annual Conference on Computer Graphics and
Interactive Techniques (SIGGRAPH 01), pp. 341-346, Los Angeles,
August 2001.
- A pseudo-code implementation of Efros&Leung routine can be taken
from this location, while a MATLAB code for the
Efros&Freeman routine can be taken from
this location.
- Homework.
The described texture synthesis methods will not work on strongly
projective images, like the two above. Try to describe in theory
what you will need in order to be able to
represent these textures and also to generate "random" textures
similar to them. Why the synthesis part can be dropped maybe, the
texture representation part will be around also
hundred years from now.
-
Lecture 7.
Homograpy and error analysis.
- The slides
of this lecture.
- Ref: HZSec.3.1-3.6 and Sec.3.8 (2Sec.4.1-4.6, 2Sec.4.8),
homograpy;       HZChap.4 (2Chap.5), error analysis;
      HZSec.A3.2 (2Sec.A4.2), the 3x3 skew-symmetric matrix
including its matrix of cofactor;
      HZSec.A3.3 and Sec.A3.4 (2Sec.A5.3, 2Sec.A5.4),
different variants of least-squares fitting;
      HZSec.A4.4 (2Sec.A6.4), Levenberg-Marquardt applied to
homograpy.
- The majority of the MATLAB programs based of the book of
Hartley and Zisserman
are here. Which routine does what is
in this place.
The routines use the Levenberg-Marquardt
algorithm too. For example, --vgg_H_from_x_nonlin-- which minimizes
starting from the Sampson approximation the gold standard algorithm,
HZPage98, uses --lsqnonlin-- in MATLAB with --optimoptions-- set to
'LargeScale' to 'off'. That it, medium-scale optimization with
L-M routine.
- Homework. Take two similar images. For example, this two
images:
one and
two,
or you can use your own preferences. First you should find the
corresponding points with Harris or Hesse corner detector
and filter it with RANSAC.
The (quasi)-inlier pairs are the input in the 2D homograpy
between the two images. Find the homography for one large plane,
and estimate also the covariance of the homograpy, if points have
the same variance in both images and along both directions.
The covariance is described in Appendix A of the doctoral thesis
of Antonio Criminisi, which we will use also in the
next lecture.
Can you come up with a
homography type method to rectify the texture distortion from Lecture
6? If yes, apply it to the
second image
from the texton part.
-
Lecture 8.
What can be extracted from a single 2D image.
- The slides
of this lecture.
- Ref:HZSec.7.1 without Plücker line representation
and Sec.7.4 (2Sec.8.1 and 2Sec.8.4), single view geometry;
      HZSec.A5.2 (2Sec.A7.2), planar homologies;
      HZSec.A5.3 (2Sec.A7.3), elations.
- F. Schaffalitzky and A. Zisserman.
Geometric grouping of repeated elements within images.
"Shape, Contour and Grouping in Computer Vision", LNCS 1681, Springer,
pp. 165-181, 1999.
- A. Criminisi, I. Reid and A. Zisserman.
Single view metrology. International Journal of Computer
Vision, vol. 40, no. 2, pp. 123-148, 2000.
- The doctoral thesis of Antonio Criminisi,
Accurate Visual Metrology from Single
and Multiple Uncalibrated Images (1999), in the last chapter
applies the reconstruction methods, some of which we saw in the paper
above, to 3D reconstruction of renaissance paintings.
- Homework. In the
indoor image
you can take different camera orientations and generate a new view.
The translation in the world coordinates remains unchanged of course.
Generate two or three synthetic images through homographies where you
concentrate of different parts of the scene.
Assume now that you know from the scene one or two measurements
of the chairs, say, the height where you sit and, if you need it,
the height of a chair. Since the chairs are almost similar in the
chair-world, you just measure one at home.
Can you recover the 3D location of the camera. See also Sec.3.3,
Fig.14 and Fig.22 in Criminisi et al. paper and 2HZSec.8.7.
-
Lecture 9.
Epipolar geometry in details.
- The slides
of this lecture.
- Ref: HZChap.8 without Sec.8.4 (2Chap.9), epipolar
geometry;       HZSec.10.1 to Sec.10.6 without Sec.10.3
and Sec.10.4.2 (2Chap.11), computation of the fundamental matrix;
     
HZSec.A3.2 (2Sec.A4.2), symmetric and skew-symmetric matrices.
- Homework. While the HZ book programs (above at
lecture 7) have an epipolar routine for 7 points
--vgg_F_from_7pts_2img--, it does not have RANSAC or
the Levenberg-Marquardt iterations.
If you want to use the HZ book programs, first you have to apply
RANSAC (given in the lecture 6), and after that use the 7 points
algorithm from the book.
Take the following two images:
corridor one
and corridor two
and estimate the rank-2 fundamental matrix.
You can also try
Phillip Torr's
programs who is at Oxford Brookes University.
The programs are called "Structure and Motion Toolkit in Matlab".
Find a version in the program with uses
also robust fitting, but it will be MLESAC and not RANSAC. The
routine --torr_estimateF.m-- seems to be a good start.
-
Lecture 10.
Computations based on two images.
- The slides
of this lecture.
- Ref: HZChap.9 (2Chap.10), 3D reconstruction;
      HZChap.11, Sec.11.4 and Sec.11.5 will not be
treated in detail (2Chap.12), structure computation;
      HZChap.12 (2Chap.13), homography and scene planes;
      HZSec.A3.1.2 (2Sec.A4.1.2) Householder matrices.
- Homework. Take the result from the lecture 9 homework.
Do linear triangulation and obtain the 3D points. Using the
stratified method reconstruct the affine and the metric (similarity)
image. You can use scene constrains, orthogonality, known internal
parameters, etc. in order to complete the reconstruction.
You may have to write some programs in MATLAB since here we start with
the fundamental matrix problem already solved.
-
Lecture 11.
Factorization. Auto-Calibration.
- The slides
of this lecture.
- Ref: HZSec.17.1 to Sec.17.3 and Sec.17.5
without Sec.17.2.1 (2Chap.18 with 2Sec.18.3 on non-rigid
factorization is only in the second edition), factorization;
      HZSec.18.1, Sec.18.2 and Sec.18.5
(2Chap.19), describe our auto-calibration but a little bit differently,
so these sections are only for additional reading.
- P. Anandan and M. Irani. Factorization with uncertainty.
International Journal of Computer Vision, vol. 49, no. 2-3, pp.
101-116, 2002.
- Homework. The
corridor sequence can be taken from here. Do a 3D reconstruction
from sequence, starting with the projective reconstruction, moving to
the metric conversion and ending with the refinement through bundle
adjustment. You can take the same camera for the 11 images and assume
at least that the skew is zero. See also Table 18.4 (2Table 19.4).
This sequence is processed in the book too, Fig.17.1 (2Fig.18.3), and
all the data, including a VRML model which we of course will not do,
can be seen at the Oxford data site.
-
Lecture 12.
Stereo vision.
- The slides
of this lecture.
- Ref: A chapter from the draft of R. Szeliski book,
stereo correspondence, can also be consulted. The
references are a subset from here.
- D. Scharstein and R. Szeliski.
A taxonomy and evaluation of dense
two-frame stereo correspondence algorithms.
International Journal of
Computer Vision, vol. 47, no.1-2-3, pp. 7-42, 2002.
- Hai Tao and H. S. Sawhney. Global matching criterion and
color segmentation based stereo. Proc. Workshop on the Application
of Computer Vision (WACV2000), pp. 246-253, December 2000.
- The
Middlebury Stereo Vision Page can be accessed from here.
- Homework. A stereo rig with four (two and two) images
can be takes from here:
number 1,  
number 2,  
number 3,  
number 4.
The HZSec.18.9 (2Sec.19.10) is about auto-calibration of a stereo rig
and solved this image sequence in Fig.18.7 (2Fig.19.9).
Directly related to stereo is HZSec.10.12 (2Sec.11.12),
image rectification and in Sec.12.4 (2Sec.13.4),
infinite homography, at the end there is
a little bit about stereo correspondence too.
The Algorithm 18.5 (2Algorithm 19.5) describes most of the procedure.
Take the first two images and find the projective reconstruction
starting from fundamental matrix and point correspondences.
Repeat it for the second pair of images. Find the 4x4
homography matrix which connect the two stereo images, followed by
finding the plane at infinity. This is the affine calibration.
The metric calibration uses only zero skew, see the book.
To verify the calibration, measure two lines which are 90 degrees
apart.
-
Lecture 13.
Motion.
- The slides
of this lecture.
- Ref: A chapter from the draft of R. Szeliski book,
dense motion estimation, can also be consulted. The references are
in the previous lecture.
- T. S. Huang and A. N. Netravali.
Motion and structure from feature correspondences: A review.
Proceedings of the IEEE, vol. 82, no. 2, pp. 252-268, 1994.
- Th. B. Moeslund,, A. Hilton and V. Krüger.
A survey of advances in vision-based human motion capture
and analysis. Computer Vision and Image Understanding, vol. 104,
no. 2-3, pp. 90-126, 2006.
- Homework. The
sequence has several
persons walking. We will concentrate only on the young guy having
a light sweater and coming toward the camera, from time 10:35:17:700
on. We don't know the cameras
calibration, so without additional constraints we cannot recover
completely the motion estimation. Try optical flow
with 2D affine motion to recover the movements of the person.
Do you need affine movements of simple translation is already
satisfactory?
-
Lecture 14.
Continuation of lecture 13.
What can and what cannot be achieved today in computer vision.
Additional Information
CVonline
Computer Vision Home Page
Computer Vision Industry
OpenCV Reference
MATLAB Processing Toolboxes start at
MATLAB On-line.
Grading
Active participation in the course (20%). Homeworks, presentations
and projects based on papers distributed in the course (80%).