## Description

Written Problem

(20 points) Explore the Hough transform and probabilistic Hough transform code in OpenCV.

Find images for which these produce good results and images for which they produce poor results.

Try to explain why. Submit a pdf containing images, results and your explanations. Include the

code you wrote (embedded in the file).

Programming Problems

1. (30 points) In class we started to implement an edge detector in the Jupyter notebook

edge demo.ipynb, including Gaussian smoothing and the derivative and gradient computations. The code is posted on the Piazza site under Lectures.

In this problem, you will implement the non-maximum suppression step and then a thresholding step, one that is simpler than the thresholding method we discussed in class. Here are

more details:

• For non-maximum suppression, a pixel should be marked as a maximum if its gradient

magnitude is greater than or equal to those of its two neighbors along the gradient

direction, one “ahead” of it, and one “behind” it. (Note that by saying “greater than or

equal”, edges that have ties will be two (or more) pixels wide — not the right solution

in general, but good enough for now.) As examples, if the gradient direction at pixel

location (x, y) is π/5 radians (36◦ degrees) then the ahead neighbor is (x + 1, y + 1) and

the behind neighbor is (x − 1, y − 1), whereas if the gradient direction is 9π/10 (162◦

)

then the ahead neighbor is (x − 1, y) and the behind neighbor is (x + 1, y).

• For thresholding, compute the mean, µ, and the standard-deviation, s, of the gradient

magnitude of the non-suppressed pixels whose gradient magnitudes are at least

0.1. The threshold will be the minimum of µ + s and 10/σ, the former value because in

most images, most edges are noise, and the latter value to accommodate clean images

1

with no noise. Dividing by σ is because Gaussian smooth reduces the gradient magnitude

by a factor of σ.

The command-line should be

python p1_edge.py sigma in_img

where

• sigma is the value of σ used in Gaussian smoothing, and

• in_img is the input image.

The text output from the program will be:

• The number of pixels that remain as possible edges after the non-maximum suppression

step.

• µ, s and the threshold, each on a separate line and accurate to 2 decimal places.

• The number of pixels that remain after the thresholding step.

Three image outputs will be generated, with file names created by adding a four character

string to the file name prefix of the input image. Examples below assume that the image is

named foo.png. Here are the three images:

• The gradient directions of all pixels in the image encoded to the following five colors:

red (255, 0, 0) for pixels whose gradient direction is primarily east/west; green (0, 255, 0)

for pixels whose gradient direction is primarily northwest/southeast; blue (0, 0, 255) for

pixels whose gradient direction is primarily north-south; white (255, 255, 255) for pixels

whose gradient direction is primarily northeast/southwest; and black (0, 0, 0) for any

pixel on the image border (first or last row or column) and for any pixel, regardless of

gradient direction, whose gradient magnitudent is below 1.0. The file name should be

foo_dir.png.

• The gradient magnitude before non-maximum suppression and before thresholding, with

the maximum gradient mapping to the intensity 255. The file name should be foo_grd.png.

• The gradient magnitude after non-maximum suppression and after thresholding, with

the maximum gradient mapping to the intensity 255. The file name should be foo_thr.png.

Notes:

• Very important: be sure that your image is of type float32 (or float64) before Gaussian

smoothing.

• At first it will seem a bit challenging — or at least tedious — to convert the initial gradient direction, which is measured in radians in the range [−π, π], into a decision as to whether the gradient magnitude is primarily west/east, northwest/southest,

north/south, or northeast/southwest. For example, the ranges from [−π, −7π/8], [−π/8, π/8],

and [7π/8, π] are all east-west. You could write an extended conditional to assign these

directions, or you write one or two expressions, using Numpy’s capabilty for floatingpoint modular arithmetic, to simultaneously assign 0 to locations that are west/east, 1

to locations that are northwest/southeast, etc. Think about it!

2

• This problem is a bit harder than previous problems to solve without writing Python for

loops that range over the pixels, but good solutions do exist. Full credit will be given

for a solution that does not require for loops, while up to 27 of 30 will be given for a

solution that requires for loops. In other words, we’ve provided mild incentive for you to

figure out how to work solely within Python (and numpy) without for loops. Examples

that have been given in class and even worked through on homework can help. You’ll

have to consider each direction (somewhat) separately.

2. (30 points) How consistent are results of Harris keypoint detection and SIFT keypoint

detection with each other? One way to check is to extract the keypoints with the highest

value of the Harris measure (as implemented in class, using κ = 0.04) and the highest value

of the SIFT measure (see the response member variable of the KeyPoint class) and compare

them.

The only free parameter is the value of σ for the Harris code we wrote in class. For the

Harris detector, you will need to avoid the thresholding operation and instead go to nonmaximum suppressions. (Note that when using the OpenCV compare function, the positive

values returned are 255, so you’ll have to be careful with how you handle the results — the

code from class is not quite right!) The SIFT keypoints are at all different scales (see the

size member variable of the KeyPoint class). Please remove from consideration any SIFT

keypoints whose size is more than 3σ. Also, remove repeated SIFT keypoints at the same

location (due to multiple orientations), since image position is our concern here.

The command-line should be simply

python p2_compare.py sigma img

where sigma is the value of σ to be used for Gaussian smoothing of the original image in the

Harris measure, and img is the input image.

For each of the Harris and SIFT keypoints, output the five keypoints with the strongest

responses. For each keypoint, include the x,y position, and the response, all on one line. The

response values should be accurate to 4 decimal places, while the positions should be accurate

to 2.

Now for each of the top 100 keypoints of the Harris measure find the closest keypoint (in

image position) among the top 200 SIFT keypoints. (If a detector returns fewer than these

numbers, just use what it returns.) Then calculate:

• The average and median image distance.

• The average and median difference in rank positions. (To be clear, for the i-th highest

Harris keypoint, if the closest SIFT keypoint is the j-th highest among SIFT keypoints,

then the difference in rank positions is | i − j |.)

Repeat the process calculation, but comparing the top 100 SIFT keypoints against the top

200 Harris keypoints.

See the example provided for output details.

Finally output the top 200 keypoints themselves, drawn on top of the original grayscale (not

color) images. Add _harris and _sift to the file name prefixes to generate the files.

3

3. (30 points) Implement the keypoint gradient direction estimation technique based on the

Lecture 09 notes and in particular the detailed discussion in class on October 12. This includes weighted voting, peak estimation, and subpeak interpolation. The σv used in Gaussian

smoothing for the orientation voting should be 2 times the σ used in Gaussian smoothing prior

to computation of the image derivatives. The region over which the orientation calculation is

made should be 2w + 1 pixels wide, where w = rnd(2 ∗ σv).

Be sure to include the final smoothing step applied to the histogram. In particular, replace

each histogram value with the weighted average of it and half the magnitudes of its two closest

neighbors. For example, if a histogram entry is 10 and its two neighbors have values of 8 and

7, then the new histogram value with be (10 + (8 + 7)/2)/2 = 8.75.

The command-line for your program should be

python p3_orientation.py sigma img points

where sigma is the value of σ for Gaussian smoothing of the image prior to derivative computation, img is the input image, and points is a file containing integer-valued row/column

pixel locations at which to compute orientation.

The output from your program is a detailed output from the keypoint point locations provided.

For each, output

(a) the histogram before and after smoothing: in each line, output the min and max orientations (in integer degrees) covered by that bin, the histogram value before smoothing

and after smoothing (accurate to one decimal).

(b) all peaks locations (in degrees in the range -180 to 180), and the peak values, ordered

by decreasing peak height (interpolated), and

(c) the number of orientation histogram peaks that are either the maximum (interpolated)

value or within 80% of the maximum value (after interpolated).

You may want to draw for yourself the histogram as a pyplot figure, but we will not use this

in grading. Finally, you may assume that the points are sufficiently far from the image border

that the orientation computation region is entirely inside the image.

4