CSci 4270 and 6270 Computational Vision, Homework 3 solved




Written Problem
(20 points) Explore the Hough transform and probabilistic Hough transform code in OpenCV.
Find images for which these produce good results and images for which they produce poor results.
Try to explain why. Submit a pdf containing images, results and your explanations. Include the
code you wrote (embedded in the file).
Programming Problems
1. (30 points) In class we started to implement an edge detector in the Jupyter notebook
edge demo.ipynb, including Gaussian smoothing and the derivative and gradient computations. The code is posted on the Piazza site under Lectures.
In this problem, you will implement the non-maximum suppression step and then a thresholding step, one that is simpler than the thresholding method we discussed in class. Here are
more details:
• For non-maximum suppression, a pixel should be marked as a maximum if its gradient
magnitude is greater than or equal to those of its two neighbors along the gradient
direction, one “ahead” of it, and one “behind” it. (Note that by saying “greater than or
equal”, edges that have ties will be two (or more) pixels wide — not the right solution
in general, but good enough for now.) As examples, if the gradient direction at pixel
location (x, y) is π/5 radians (36◦ degrees) then the ahead neighbor is (x + 1, y + 1) and
the behind neighbor is (x − 1, y − 1), whereas if the gradient direction is 9π/10 (162◦
then the ahead neighbor is (x − 1, y) and the behind neighbor is (x + 1, y).
• For thresholding, compute the mean, µ, and the standard-deviation, s, of the gradient
magnitude of the non-suppressed pixels whose gradient magnitudes are at least
0.1. The threshold will be the minimum of µ + s and 10/σ, the former value because in
most images, most edges are noise, and the latter value to accommodate clean images
with no noise. Dividing by σ is because Gaussian smooth reduces the gradient magnitude
by a factor of σ.
The command-line should be
python sigma in_img
• sigma is the value of σ used in Gaussian smoothing, and
• in_img is the input image.
The text output from the program will be:
• The number of pixels that remain as possible edges after the non-maximum suppression
• µ, s and the threshold, each on a separate line and accurate to 2 decimal places.
• The number of pixels that remain after the thresholding step.
Three image outputs will be generated, with file names created by adding a four character
string to the file name prefix of the input image. Examples below assume that the image is
named foo.png. Here are the three images:
• The gradient directions of all pixels in the image encoded to the following five colors:
red (255, 0, 0) for pixels whose gradient direction is primarily east/west; green (0, 255, 0)
for pixels whose gradient direction is primarily northwest/southeast; blue (0, 0, 255) for
pixels whose gradient direction is primarily north-south; white (255, 255, 255) for pixels
whose gradient direction is primarily northeast/southwest; and black (0, 0, 0) for any
pixel on the image border (first or last row or column) and for any pixel, regardless of
gradient direction, whose gradient magnitudent is below 1.0. The file name should be
• The gradient magnitude before non-maximum suppression and before thresholding, with
the maximum gradient mapping to the intensity 255. The file name should be foo_grd.png.
• The gradient magnitude after non-maximum suppression and after thresholding, with
the maximum gradient mapping to the intensity 255. The file name should be foo_thr.png.
• Very important: be sure that your image is of type float32 (or float64) before Gaussian
• At first it will seem a bit challenging — or at least tedious — to convert the initial gradient direction, which is measured in radians in the range [−π, π], into a decision as to whether the gradient magnitude is primarily west/east, northwest/southest,
north/south, or northeast/southwest. For example, the ranges from [−π, −7π/8], [−π/8, π/8],
and [7π/8, π] are all east-west. You could write an extended conditional to assign these
directions, or you write one or two expressions, using Numpy’s capabilty for floatingpoint modular arithmetic, to simultaneously assign 0 to locations that are west/east, 1
to locations that are northwest/southeast, etc. Think about it!
• This problem is a bit harder than previous problems to solve without writing Python for
loops that range over the pixels, but good solutions do exist. Full credit will be given
for a solution that does not require for loops, while up to 27 of 30 will be given for a
solution that requires for loops. In other words, we’ve provided mild incentive for you to
figure out how to work solely within Python (and numpy) without for loops. Examples
that have been given in class and even worked through on homework can help. You’ll
have to consider each direction (somewhat) separately.
2. (30 points) How consistent are results of Harris keypoint detection and SIFT keypoint
detection with each other? One way to check is to extract the keypoints with the highest
value of the Harris measure (as implemented in class, using κ = 0.04) and the highest value
of the SIFT measure (see the response member variable of the KeyPoint class) and compare
The only free parameter is the value of σ for the Harris code we wrote in class. For the
Harris detector, you will need to avoid the thresholding operation and instead go to nonmaximum suppressions. (Note that when using the OpenCV compare function, the positive
values returned are 255, so you’ll have to be careful with how you handle the results — the
code from class is not quite right!) The SIFT keypoints are at all different scales (see the
size member variable of the KeyPoint class). Please remove from consideration any SIFT
keypoints whose size is more than 3σ. Also, remove repeated SIFT keypoints at the same
location (due to multiple orientations), since image position is our concern here.
The command-line should be simply
python sigma img
where sigma is the value of σ to be used for Gaussian smoothing of the original image in the
Harris measure, and img is the input image.
For each of the Harris and SIFT keypoints, output the five keypoints with the strongest
responses. For each keypoint, include the x,y position, and the response, all on one line. The
response values should be accurate to 4 decimal places, while the positions should be accurate
to 2.
Now for each of the top 100 keypoints of the Harris measure find the closest keypoint (in
image position) among the top 200 SIFT keypoints. (If a detector returns fewer than these
numbers, just use what it returns.) Then calculate:
• The average and median image distance.
• The average and median difference in rank positions. (To be clear, for the i-th highest
Harris keypoint, if the closest SIFT keypoint is the j-th highest among SIFT keypoints,
then the difference in rank positions is | i − j |.)
Repeat the process calculation, but comparing the top 100 SIFT keypoints against the top
200 Harris keypoints.
See the example provided for output details.
Finally output the top 200 keypoints themselves, drawn on top of the original grayscale (not
color) images. Add _harris and _sift to the file name prefixes to generate the files.
3. (30 points) Implement the keypoint gradient direction estimation technique based on the
Lecture 09 notes and in particular the detailed discussion in class on October 12. This includes weighted voting, peak estimation, and subpeak interpolation. The σv used in Gaussian
smoothing for the orientation voting should be 2 times the σ used in Gaussian smoothing prior
to computation of the image derivatives. The region over which the orientation calculation is
made should be 2w + 1 pixels wide, where w = rnd(2 ∗ σv).
Be sure to include the final smoothing step applied to the histogram. In particular, replace
each histogram value with the weighted average of it and half the magnitudes of its two closest
neighbors. For example, if a histogram entry is 10 and its two neighbors have values of 8 and
7, then the new histogram value with be (10 + (8 + 7)/2)/2 = 8.75.
The command-line for your program should be
python sigma img points
where sigma is the value of σ for Gaussian smoothing of the image prior to derivative computation, img is the input image, and points is a file containing integer-valued row/column
pixel locations at which to compute orientation.
The output from your program is a detailed output from the keypoint point locations provided.
For each, output
(a) the histogram before and after smoothing: in each line, output the min and max orientations (in integer degrees) covered by that bin, the histogram value before smoothing
and after smoothing (accurate to one decimal).
(b) all peaks locations (in degrees in the range -180 to 180), and the peak values, ordered
by decreasing peak height (interpolated), and
(c) the number of orientation histogram peaks that are either the maximum (interpolated)
value or within 80% of the maximum value (after interpolated).
You may want to draw for yourself the histogram as a pyplot figure, but we will not use this
in grading. Finally, you may assume that the points are sufficiently far from the image border
that the orientation computation region is entirely inside the image.