Given are two images, I0 and I1, taken by a single camera while it is either stationary or moving
through a scene where objects in the scene may be moving. Your problem is to (a) estimate the
apparent motion vectors at a sparse set of points, (b) determine whether or not the camera is
moving, (c) if it is moving, find the “focus of expansion” induced by the camera’s motion, (d)
determine which points are moving independent of the camera’s movement, and (e) cluster such
points into coherent objects.
Looking in more detail:
• You may choose the points where you estimate motion based on the Harris criteria, the KLT
criteria, or any other method you wish (i.e. Shi-Tomasi). You may use OpenCV algorithms
to do this, but you already have the tools to do this going back to early in the semester.
• Estimate the image motion at these points. You may use the algorithm discussed in Lecture
19/20, but you may also use descriptor matching. Once again there is OpenCV code to do
this (calcOpticalFlowPyrLK), but you can also “roll your own”. Regardless of what you do,
be sure you can handle non-trivial image motion distances.
• Determine if the camera is moving. For this you need to assume that the significant majority
of the points in the image are stationary. Develop simple criteria to detect this.
• If the camera is moving, estimate the image position of the focus of expansion. If you consider
each of the sparse image points and its motion vector as a line, all correctly-estimated lines
for stationary points in the scene (i.e. all motion vectors induced solely by camera motion)
will intersect at a single point. For simplicitly, we’ll assume the camera is roughly pointing
in the direction of motion, so the focus of expansion is in the image. You should be able
to adapt RANSAC to estimate this point, including a least-squares estimate of the point’s
position once all “inliers” are found.
• Once the focus of expansion point is found, motion vector lines that do not come close to
this point correspond either to errors in motion estimation or independently moving objects.
Similarly, if the camera is not moving, all non-trivial estimated motion vectors correspond
to errors or independently moving objects. In either case, see if you can figure out a way
to identify the points from independently moving objects and and group them, throwing out
points in groups that are too small. One challenge to doing so is that the motion vectors of
points with small apparent motions are fairly unstable — so that the orientations of the lines
you generate can have a great deal of error.
For each pair of input images, I0 and I1 please generate two output images and diagnostic text
• An image with the points, the motion vectors, and the focus of expansion drawn over top of
image I1. If there is no motion of the camera, show no focus of expansion, but make sure the
fact that there is no motion is documented and justified in your diagnostic output.
• A second image, similar to the first, but this one showing the independently moving objects
you detected. For each independently moving object, select a random color and use this to
color the points and motion vectors determined to be part of that cluster and to draw a
bounding box around the points.
In addition to the image output, please show brief but clear diagnostic output from your program.
What to Submit
Submit just two documents. The first is your python program. The second is a description of your
algorithm and result. This should (a) explain your design decisions and any trade-offs involved,
(b) demonstrate your results, and (c) evaluate the strengths and weaknesses of your algorithm as
highlighted by these results. How many results will be needed? My answer is enough to make each
of your points without being redundant. How well does each step work? When and why might it
fail (or at least produce lower quality results)?
We will use the following rubric in grading your submission, so be sure your submission highlights
• Selection of points to estimate motion
• Estimation of apparent motion
• Estimation of the focus of expansion, or deciding that the camera did not move.
• Clustering of independently moving objects
• Quality of code
• Clarity of explanation
• Highlight of strengths and weaknesses
• Selection of illustrative examples.