CS 6320 Computer Vision Homework 1 solved

$40.00

Category: You will receive a download link of the .ZIP file upon Payment

Description

5/5 - (1 vote)

This assignment will consist of some basic image processing operations. We are also providing some
sample outputs and pseudocode examples since this is your first homework.
1. Use canny edge detection (Try ‘help edge’ in Matlab) to extract edges(i.e., 2D points denoting changes
in intensity values) from an image. Try to fit line segments to edge using RANSAC, a favorite technique
for robust model fitting in computer vision applications. Using two 2D points (x1, y1) and (x2, y2), we
can obtain a line equation of the form ax+by+c = 0. The basic idea is line fitting is simple: We randomly
sample two edge pixels and compute a line equation that passes through these two points. For the
chosen line equation, we check the total number of edge pixels that lie within a small threshold
distance (say 1 pixel). Based on the number of points that lie on the chosen line equation, we decide
to treat it as an image line or not. We iteratively find all the line segments in the image based on some
threshold (i.e., each line should have at least 50 edge pixels). After finding every line, we remove all
the edge points lying on this line before finding the next line. For this assignment, show the results on
the three provided images. Please note that there are still research papers that are published on line
detection. This is a naïve idea and will not work robustly for all images. For example, we show an
image and the detected lines below:
Left: Original Image Right: A few detected line segments shown in different random colors.
Pseudocode
Input: Image
Output: Image with different line segments marked in different colors
1. Initialize LINE_SET = {}, ITERS = 0, TOTAL_NO_ITERS = 10000, MAX_PAIR_DISTANCE = 100,
MIN_POINTLINE_DISTANCE = 1, MIN_LINE_PIXEL_NUM= 50.
2. Extract canny edge pixels from the given input image I. Let us denote the edge pixels by a
set of 2D coordinates given by the EDGE_SET={(x1,y1),(x2,y2),…,(xn,yn)}.
3. ITERS = ITERS + 1; Select a random edge pixel (i.e. P(xi,yi)) from the EDGE_SET. Select
another random edge pixel Q(xj,yj) that is within MAX_PAIR_DISTANCE from the point P. Fit
a 2D line PQ passing through the points P and Q in the form ax+by+c = 0.
4. For all the points in the EDGE_SET, identify the edge points that are within
MIN_POINTLINE_DISTANCE from the line PQ. Construct a set INLIER and insert all the edge
points that satisfy the distance constraint. If INLIER has a sufficient number of edge pixels
(say MIN_LINE_PIXEL_NUM), then you treat this as a line. Let the inlier set be given as
follows: INLIER = {{xa,ya},{xb,yb},…{xm,ym}}. Now add the INLIER to the set LINE_SET. You
can think of the LINE_SET as a “set of sets” consisting of several INLIER sets denoting
individual lines. Now remove the INLIER from the EDGE_SET.
5. If (ITERS < TOTAL_NO_ITERS) GOTO step 3. 6. Display the line segments by plotting the points in each of the INLIER set from the LINE_SET using a different random color. [30 Points] 2. Write a simple sky segmentation using six thresholds for specifying the upper and lower values for each of the 3 RGB components. If a pixel color values lie within the specified values, then it is considered to be a sky pixel. Note that this is a simple segmentation routine and the results are not expected to be perfect. In the following example, we define a range for Red to be between 0 and 100, Green to be between 1 and 150, and Blue to be between 100 and 255. Please show the sky segmentation for the provided images. Some of the images are provided from the Middlebury dataset: http://vision.middlebury.edu/stereo/data/ Left: Original Image, Right: Segmentation mask for the sky (White corresponds to Sky, and Black corresponds to non-sky region). Pseudocode Input: original image Output: Segmented image showing sky and non-sky region 1. Let the input image be given by I. Initialize the color range for each component: R_MIN = 0, R_MAX = 100, G_MIN = 1, G_MAX = 150, B_MIN = 100, and B_MAX = 255; 2. Use a new output image SEGMENT that has the same dimensions as the input image. 3. Loop over every pixel in the image and if the three components (R, G, B) lie within the specified range, treat it as the sky and give White color (255,255,255). Otherwise, treat it as non-sky and give black color (0,0,0). 4. Output the SEGMENT image. [20 Points] 3. Write a simple stereo matching program to compute the disparity map. The two images in a stereo pair are usually referred to as “left” and “right” images. The images are said to be rectified if every pixel in the left image has a matching pixel in the right image at the same height. In a typical stereo pair, a pixel p(x,y) in the left image usually has a matching pixel q(x’,y) in the right image. Note that the ‘y’ coordinate is the same in both the left and right images. Every pixel in the left image gets shifted by a small value ‘d’ on the right image in the following manner: x’ = x – d. Here ‘d’ is the positive disparity value for the pixel p(x,y) in the left image. Note that a pixel that is further away from the camera will have a small disparity value. A pixel that is very close to the camera will have a high disparity value. In other words, the depth “D” of a pixel is inversely proportional to the disparity ‘d’. To match a pixel p(x,y) with another pixel q(x’,y) we use normalized cross-correlation (NCC). The expression for NCC is given below for two square matrices A and B with elements 𝑎𝑖𝑗 and 𝑏𝑖𝑗. 𝑁𝐶𝐶(𝐴, 𝐵) = ∑ 𝑎𝑖𝑗 𝑖𝑗 . 𝑏𝑖𝑗 √∑𝑎𝑖𝑗 2 √∑𝑏𝑖𝑗 2 Note that two pixels p(x,y) and q(x’,y) can be considered as a perfect match if the NCC score is 1. If the NCC score is 0, then the two pixels form the worst possible match. For every pixel p(x,y) we take a window of size, say 5x5, centered at p(x,y) and treat this as the matrix A. For every pixel q(x’,y) we do the same and obtain a window centered at q and call it B. We iterate over different values of x’=xd, where d = {1,2,3,…,50} and compute the d that gives the highest NCC score for matrices A and B. The goal is to compute a disparity image where we have the best disparity ‘d’ value for every pixel in the left image. You need to ignore disparity computation at the boundary so that your matrices A and B do not cross the boundaries. Show the stereo reconstruction for the stereo pairs that are provided. Note that this is a simple and naïve stereo matching method and the results are not expected to be perfect. Left: left image, Middle: right image, Right: Disparity map (white corresponds to large disparity and darker gray regions correspond to small disparity values). We searched for disparity values ranging from 1 to 50 for each pixel. Pseudocode Input: Left and Right images Output: Disparity image 1. DISPARITY_RANGE = 50, WIN_SIZE = 5, EXTEND = (WIN_SIZE-1)/2; 2. Let left and right be the rectified input images with the same dimensions. Ny = y-dimension of the image, Nx = x-dimension of the image. 3. Convert the two images to grayscale images: Ileft = rgb2gray (left), Iright = rgb2gray(right). 4. Initialize a disparity image DISPARITY that has the same dimensions as Ileft or Iright. 5. For y = 1 to Ny { For x = 1 to Nx { bestDisparity = 0; bestNCC = 0; // lowest NCC score For disp = 1 to DISPARITY_RANGE { Patch1 = Ileft(y-EXTEND:y+EXTEND,x-EXTEND:x+EXTEND) Patch2 = Iright((y-EXTEND:y+EXTEND,x-disp-EXTEND:x-disp +EXTEND) currNCC = NCC(Patch1,Patch2) if (currNCC > bestNCC)
{
bestNCC = currNCC;
bestDisparity = disp;
}
}
DISPARITY(y,x) = bestDisparity;
}
}
6. Output the disparity image DISPARITY.
[35 Points]
4. Download the visual SFM software (http://ccwu.me/vsfm/). Create a 3D model using at least 25
images. The images can be obtained from the following sources:
• The images can be captured using your phone or any digital camera.
• The images can also be downloaded from the web (Flickr, Google images).
You can search for “visualsfm models” in YouTube to see some examples of 3D models. Try to be
creative and submit models of interesting objects. For example, you can try reconstructing Submit
the 3D model in PLY format, which can be visualized using meshlab (http://www.meshlab.net/) [20
points].
[15 Points]