## Description

Given are two images, I0 and I1, taken by a single camera while it is either stationary or moving

through a scene where objects in the scene may be moving. Your problem is to (a) estimate the

apparent motion vectors at a sparse set of points, (b) determine whether or not the camera is

moving, (c) if it is moving, find the “focus of expansion” induced by the camera’s motion, (d)

determine which points are moving independent of the camera’s movement, and (e) cluster such

points into coherent objects.

Looking in more detail:

• You may choose the points where you estimate motion based on the Harris criteria, the KLT

criteria, or any other method you wish (i.e. Shi-Tomasi). You may use OpenCV algorithms

to do this, but you already have the tools to do this going back to early in the semester.

• Estimate the image motion at these points. You may use the algorithm discussed in Lecture

19/20, but you may also use descriptor matching. Once again there is OpenCV code to do

this (calcOpticalFlowPyrLK), but you can also “roll your own”. Regardless of what you do,

be sure you can handle non-trivial image motion distances.

• Determine if the camera is moving. For this you need to assume that the significant majority

of the points in the image are stationary. Develop simple criteria to detect this.

• If the camera is moving, estimate the image position of the focus of expansion. If you consider

each of the sparse image points and its motion vector as a line, all correctly-estimated lines

for stationary points in the scene (i.e. all motion vectors induced solely by camera motion)

will intersect at a single point. For simplicitly, we’ll assume the camera is roughly pointing

in the direction of motion, so the focus of expansion is in the image. You should be able

to adapt RANSAC to estimate this point, including a least-squares estimate of the point’s

position once all “inliers” are found.

• Once the focus of expansion point is found, motion vector lines that do not come close to

this point correspond either to errors in motion estimation or independently moving objects.

1

Similarly, if the camera is not moving, all non-trivial estimated motion vectors correspond

to errors or independently moving objects. In either case, see if you can figure out a way

to identify the points from independently moving objects and and group them, throwing out

points in groups that are too small. One challenge to doing so is that the motion vectors of

points with small apparent motions are fairly unstable — so that the orientations of the lines

you generate can have a great deal of error.

For each pair of input images, I0 and I1 please generate two output images and diagnostic text

output:

• An image with the points, the motion vectors, and the focus of expansion drawn over top of

image I1. If there is no motion of the camera, show no focus of expansion, but make sure the

fact that there is no motion is documented and justified in your diagnostic output.

• A second image, similar to the first, but this one showing the independently moving objects

you detected. For each independently moving object, select a random color and use this to

color the points and motion vectors determined to be part of that cluster and to draw a

bounding box around the points.

In addition to the image output, please show brief but clear diagnostic output from your program.

What to Submit

Submit just two documents. The first is your python program. The second is a description of your

algorithm and result. This should (a) explain your design decisions and any trade-offs involved,

(b) demonstrate your results, and (c) evaluate the strengths and weaknesses of your algorithm as

highlighted by these results. How many results will be needed? My answer is enough to make each

of your points without being redundant. How well does each step work? When and why might it

fail (or at least produce lower quality results)?

Evaluation

We will use the following rubric in grading your submission, so be sure your submission highlights

them

• Selection of points to estimate motion

• Estimation of apparent motion

• Estimation of the focus of expansion, or deciding that the camera did not move.

• Clustering of independently moving objects

• Quality of code

• Clarity of explanation

• Highlight of strengths and weaknesses

• Selection of illustrative examples.

2