CSci 4270 and 6270 Computational Vision, Homework 5 solved


Category: You will receive a download link of the .ZIP file upon Payment


5/5 - (1 vote)

1. (20 points) Explore the GrabCut function in OpenCV. See
In particular, demonstrate an example image of your own choosing where GrabCut works
well and an example image where it works poorly. (These images must be ones that you
find or take and not images that someone else has worked on with GrabCuts.) A big part of
your effort is determining the rectangle bounding the foreground and, within this rectangle,
which pixels are definitely foreground and which are definitely background. To do this, please
examine the image interactively (see the function show_with_pixel_values from Lecture 2 Then, record one or more rectangular regions within the object youd
like to be part of the foreground and one or more youd like be part of the background after
the GrabCut segmentation. Provide these to the GrabCuts function as part of the mask. Be
sure to discuss why you think GrabCuts succeeded or failed in each case. Also, be sure to
resize your image to a reasonable working dimension — say 500 to 600 pixels as the max
dimension — to make this practical.
Your Python code should take the original input image as an argument and a file containing
pixel coordinates defining rectangles in the image. Each line should contain four pixels defining
a rectangle. The first rectangle — the “outer rectangle” — should be the bounding rectangular
mask on the object. The remaining rectangles — the “inner rectangles” — should all be
enclosed in this rectangle and should bound image locations that should definitely be inside
or definitely outside the segmented object. (You will need some way to distinguish.) I suggest,
for simplicitly, that you find the coordinates of these rectangles by hand. I realize that this
is a bit clunky, but I don’t want you to spend your time writing any fancy user interaction
Your write up should show the original image, the image with the rectangles draw overtop —
outer rectangle in one color, inner rectangles to be included in another, and inner rectangles
to be excluded in a third. Also show the resulting segmented image. Be sure to explain your
2. (20 points) Apply k-means clustering to attempt segment the grass in the images provided
on Piazza. The “data” vectors input to K-means should be at least be a concatenation of the
pixel location values and the RGB values at each pixel. You might include other measures
such as the standard deviation of the R, G and B values over a pixel’s neighborhood so that
you capture some notion of the variation in intensity (e.g. solid green regions aren’t very
grass-like). You might have to scale your position, color or other measurements to give them
more or less influence during k-means. Vary the value of k to see the effect — perhaps several
clusters together cover the grass.
Note that I don’t expect perfect segmentations. More sophisticated algorithms are needed
for this. Instead, I want you to develop an understanding of the use of k-means and of the
difficulty of the segmentation problem.
In addition to your code, please include in your write-up a small selection of images from
the ones I provided demonstrating good and bad results. Include each original image, and
separately show the clusters drawn on the images. Include example images that indicate the
effect of varying k and, perhaps, that demonstrate the effect of including measures other than
color and position.
3. (60 points) For this problem and for HW 6 we are going to consider the problem of determining the dominant “background” class of a scene. The five possibilities we will consider in
this example are grass, wheat field, road, ocean and red carpet. Some of these are relatively
easy, but others are hard. A link to the images will be provided on the Piazza site.
In our lecture on detection we focused on the “HoG” — histogram of oriented gradients —
descriptor. Beyond this, many different types of descriptors have been invented. For this
problem you are going to implement a descriptor that combines location and color and uses
no gradient information whatsoever. One descriptor will be computed for the entirety of each
image. A series of SVM classifiers that you train, one for each desired class, will then be
applied to the descriptor vector to make a decision.
Let’s start with the descriptor. Imagine that an image’s three color channels (R, G, B) are
represented as a 3D cube, where the x axis is the red color channel, the y axis is the green
color channel, and the z axis in the blue color channel. In this representation, a particular
color pixel (R, G, B) lives somewhere inside that [0-255] x [0-255] x [0-255] 3D cube. For
example, the true black pixel (0, 0, 0) will occupy the origin of the coordinate system whereas
the true white pixel will occupy the opposite, furthest corner of the cube at (255, 255, 255).
In order to form a histogram of this 2563
space, we need to break the cube into smaller subcubes. See Figure 1 (b) below. The RGB cube will be broken into t
equal-sized cubes; these
will be the bins used to make the color histogram. For example, if t = 4 each color dimension
will be broken into four equal parts and the entire cube will be divided into a 4 × 4 × 4 = 64
component histogram. The histogram will need to be flattened from a 3D matrix into a 1D
vector to be passed into the SVM for training; the dimension flattening order does not matter
as long as it is consistent across all images.
The image will be divided into overlapping blocks and one color histogram size of t
3 will be
computed in each block. These will be concatenated to form the final histogram. Let bw be
the number of blocks across the image and let bh be the number of blocks going down the
image. This will produce bw · bh blocks overall, and a final descriptor vector of size t
· bw · bh.
To compute the blocks in an image of H × W pixels, let
∆w =
bw + 1
and ∆h =
bh + 1
(a) (b)
Figure 1: Color histogram bins
Original image with ∆w and ∆h spaced lines Blocks of pixels over which histograms are formed
Figure 2: Image block tiling for bw = 4 and bh = 2.
The blocks will each cover 2∆w columns and 2∆h rows of pixels. The image pixel positions
of the upper left corners of the blocks will be
(m∆w, n∆h), for m = 0, . . . , bw − 1, n = 0, . . . , bh − 1.
Note that some pixels will contribute to only one histogram, some will contribute to two, and
others will contribute to four. The same is true of the HoG descriptor. Figure 2 illustrates
the formation of blocks.
Now for the SVM classifier. In order to train the SVM classifiers, one for each class, you will
be given a set of 4000 training images, {Ii}, with class labels yi ∈ (1, . . . , k) (for us, k = 5).
To train classifier Cj , images with label yi = j are treated as yi = +1 in linear SVM training
and images with label yi 6= j are treated as yi = −1. This will be repeated for each of the k
classifiers. The descriptor is computed for each training image Ii to form the data vectors xi
Each resulting classifier Cj will have a weight vector wj and offset bj . The score for classifier
j for a test image with descriptor vector x is
dj =

j x + bj

(The 1/kwjk ensures that dj is a signed distance.) The classification for the test image I is
the class associated with the value of j that gives the maximum dj score, even if none are
After complete training, you will test your classifiers with the set of 1000 test images. Each
will be run as described above and the label will be compared to the known correct label. You
will output the percentage correct for each category, followed by a k × k confusion matrix.
The confusion matrix entry at row r and column c shows the number of times when r was the
correct class label and c was the chosen class label. The confusion matrix would have only
0’s in the non-diagonal entries when the SVM classifier is operating at a 100% accuracy.
Some Details
(a) Be as efficient with your implementation as possible by using succinct Numpy calls
wherever possible and avoid nested Python for loops.
(b) For your SVM implementation, we suggest using sklearn.svm.LinearSVC. To use the
scikit-learn (sklearn) Python module, you will need to install the package using the
terminal. If you are using Anaconda, as I suggested, you can simply call
conda install scikit-learn
(c) When training your model, do not use the test images except for testing your model.
The LinearSVC object can use all of the default settings except for the error tuning
parameter C. Play with different values of C to optimize your SVM’s performance for
each class. The expected values of C will fall in the range [0.1, 10.0] with performances
varying across classes.
(d) The confusion matrix can be made using Matplotlib or sklearn.metrics.confusion matrix
(e) The feature extraction process might still be time consuming even with efficient Numpy
use. We suggest that you develop and debug your program using a subset of the training
image set before running your final version on the full training and test sets.
(f) We suggest using at a minimum t = 4 and bw = bh = 4. This will give a descriptor of
size 1,024.
Submit your code, your output from your final test run, and a write-up. This should contain
a description of any design choices and parameter settings, along with a summary discussion
of when your classifier works well, when it works poorly, and why.