# Intro to Image Understanding (CSC420) Assignment 3 solved

\$35.00

## Description

5/5 - (1 vote)

1. [2 points] A robber left his/her shoe behind. The police took a picture of it (see
shoe.jpg). Estimate the width and length in centimeters of the shoe from the picture
as accurately as possible! Show your work.
2. You’ve joined a CSI unit and our suspect mugShot.jpg has been implicated in a
grisly crime. Raiding his apartment, we recovered a photograph of him sitting with
his accomplice, but he knew we were on to him and shredded the photograph! We
need you to implement RANSAC and re-assemble it. You may also use a downloaded
implementation of SIFT and any code you wrote for Assignment 2. Python users check
opencv-contrib.
(a) [2 points] Create a controlled test case between two affine transformed images,
and develop your RANSAC algorithm to calculate the affine transformation via
least-squares. Recall that the minimum number of correspondences necessary to
solve for a 2D affine transform is 3. Visualize the best transformation for the best
matching image just like you did for Assignment 2, exercise 2(d). You can use
any tutorial code to help create your test-case image, or draw one, or take some
pictures.
(b) [2 points] The shredded directory contains the shredded picture pieces. Using
the mugShot, try reassembling the image in random permutations and keep the
one that best fits your model. Rank your models by the mean residual SSD.
You could alternatively try a greedy or dynamic programming approach. Show
the re-constructed image. Display your final, best reassembled image. Hint: For
speed use down-sampling.
3. For this question you do not need to write code, but you do need to write the equations
and show your work (i.e. how you derived them). If you need to calculate the location
of any points or lines in the image plane, state those as givens; but use the minimal
1
set. You can assume that the ground plane is orthogonal to the image plane. Hint:
Draw the scenes on paper from the side.
(a) [2 points] Examine image tracks.jpg. Assume you are given K, R, t. Are you
able to estimate the distance between adjacent railway ties in world coordinates?
If so, write the necessary equations as a function of the pixel locations, camera
intrinsic and/or extrinsic matrices.
(b) [2 points] Examine image man.jpg. The camera centre is 95 centimetres off
the ground. You can assume the ground is planar. Are you able to estimate the
height of the man in centimetres without K? If so, derive and write the necessary
equations as a function of the pixel locations in the image, and estimate the height.
4. Attached is an image um 000038.png recorded with a camera mounted on a car. The
focal length of the camera is 721.5, and the principal point is (609.6, 172.9). We know
that the camera was attached to the car at a distance of 1.7 meters above ground.
(a) [0.5 points] Write down the internal camera parameter matrix K.
(b) [0.5 points] Write the equation of the ground plane in the camera’s coordinate
system. You can assume that the camera’s image plane is orthogonal to the
ground.
(c) [1 point] How would you compute the 3D location of a 2D point (x, y) in the
image by assuming that the point lies on the ground? Express your answer in
the camera’s coordinate system. You can assume that the camera’s image plane
is orthogonal to the ground. No need to write code, math is fine.
Extra Credit (Easy)
You are given an image with depth captured with Microsoft Kinect. The file rgbd.mat
contains a variable im which is the RGB image and depth that contains depth information for each pixel (such an image is typically called an RGB-D image). Depth is
nothing else but the Z coordinate in the camera’s coordinate system. To get familiar
with it, you can plot it with e.g., imagesc(depth) (in Matlab). In this plot, pixels
that are red are far away, blue ones are close to the camera, the rest are somewhere
in between. Further, you can find a function camera params.m which contains the
camera’s parameters.
(a) [1 point] Compute a 3D coordinate for each pixel (with non-zero depth) in camera
coordinate system. Plot the computed point cloud (all 3D points). You can use
the function plot3 (in Matlab). For visually more appealing plots you could also
use the function surf. Include the plot in your solution document.
(b) [2 points] The file rgbd.mat also contains a variable called labels. This variable
encodes four objects of interest. For example, imagesc(labels==1) will visualize
the first object of interest and imagesc(labels==4) the fourth one. Thus, all
pixels in labels that have value 1 belong to the first object, all pixels that have
2
value 2 belong to the second object, etc. To get the x and y coordinates of all
pixels that belong to the first object, you can do: [y,x] = find(labels==1);.
For each object, compute the 3D location for all of its pixels. Now compute the
geometric center of each object by simply averaging its computed 3D coordinates.
Write code that finds the object (among the labeled four) that is farthest from
the camera (i.e. its distance to camera center is the largest). Write also code that
finds the object that is the highest above floor. Here you can assume that the
image plane is orthogonal to the floor.
Sources
• man.jpg: Chris Ford, https://www.flickr.com/photos/chrisschoenbohm/4817576839/
• tracks.jpg: Ryan Voetsch, https://www.flickr.com/photos/voetshy/
3