## Description

1. (a) [0.5 point] Using the concept of a 1D impulse δ(t) (centered at the origin), how

would you extend this notation to write a 2D impulse which is a function of two

variables u and v, and is centered at the 2D origin.(u and v represent the 2D

spatial coordinates).

(b) [0.5 point] The above impulse was located at the origin. Using the concept of

shifting, how would you write a 2D impulse located at u = m, v = n?

(c) [2 points] In the language of your choice, build a 100×200 image (i.e., height=100,

width=200), such that the image has impulses at the following coordinates: (10,20),

(20,40), (30,60), (40,80), (50,100), (60,80), (70,60), (80,40), (90,20). First coordinate represents the row index, while the second represents the column index. After

building the matrix, visualize the image in 3D ( In Matlab, see the function ‘surf’;

In Python see ‘matplot3d’). Include both the code and plot in the assignment

PDF.

(d) [1 point] We have seen in class that a 1D function f(t) of length L can be written

as a sum of weighted (and shifted impulses), i.e., f(t) = PL

l=1 f(tl)δ(t − tl). How

would you write a 2D image as weighted sum of shifted 2D impulses. Assume the

size of the image to be M × N.

2. (a) [0.5 point] Given a n×n image, I, and m×m filter, h, what is the computational

cost of computing h ∗ I (the convolution)?

(b) [0.5 point] What is the computational cost if h is a separable filter?

(c) [1 point] Figure out if the filters specified by the following kernels are separable

or not. If separable, write down the constituent horizontal and vertical filters.

F1 =

10 40 8

5 3 5

12 5 12

; F2 =

6 3 6

2 1 2

6 3 6

1

3. (a) [Problem Statement] Temperature sensing is a central component of modern

processing plants in many industries. With the proliferation of Japanese electronics in the later part of the last century, many plants installed thousands of

temperature sensors with (at the time) state of the art seven segment read outs

(Many of you are perhaps not old enough to remember the calculators). You are

in charge of renovating one such plant which processes milk to feed orphaned baby

elephants! The plant is in tip-top working condition, all the sensors being rugged

still work, and the readouts continue to provide accurate measurements.

The problem is without dismantling the legacy sensors and displays you are required to post the data on the cloud for a data analytics engine. You being a

seasoned computer vision expert, bring on the table an outrageously affordable

solution using a less than 150 buck quadcopter drone equipped with a camera.

The drone comes pre-programmed to periodically visit all the displays, take a

picture, and then tweet it. You are given one such picture (thermometer.png).

You are required to adapt the findWaldo function to get a list of digits in the

display.

Please notice that digits appear in 3 sizes in the display. For your convenience

we have provided you with templates for all the digits at 3 different scales in the

directory ‘DIGITS’, which has three sub-directories, i.e. there are in all 30 templates/filters. It will be easiest if MATLAB is used. It will be helpful if you read

the steps very carefully. Several implementation tips are given in the following

instructions. Good luck!

Following are the steps you need to perform :

(b) [0.25 point] Read in the templates. Code to read in templates is provided as

the function readInTemplates, which returns a cell array containing 30 templates

along with their dimensions.

(c) [1.5 points] Compute normalized correlation (using normxcorr2) of the display

image (thermometer.png) with each of the templates, and store it in a (M×N×30)

array (this will greatly simplify the later steps); M × N is the size of the input

image. Let’s name this array as ‘corrArray’. Advice for this step follows. Please

read carefully.

i. Don’t forget to convert the image and the templates into grayscale and cast

them into double.

ii. The output of the function normxcorr2 is bigger than the input image by

offsets on both sides, and in both directions. If tH and tW are the height

and width of the respective template, while M and N represent the height and

width of the image, extract the central portion of the output as follows. Define

two offset variables as offSetX = round(tW/2), and offSetY = round(tH/2).

Then extract the portion (offSetY : offSetY + M – 1, offSetX : offSetX +

N-1). This has to be done for the output of correlation with every template

2

to ensure the output is always the same size, and the sizes of all the templates

are not the same. It is for this reason that we stored the template dimensions

separately while reading the templates.

(d) [0.25 point] The result of every correlation is an image itself. At every pixel

check which template gave the maximum correlation. A naive implementation of

this step will take ages. It is for this reason you were advised to maintain the

outputs of correlation in a multidimesional array, so that you can make good use

of operations along a specific dimension of the array. For instance if you had

stored the results in corrArray, then you need to find the maximum along the

third dimension i.e. use [maxCorr, maxIdx] = max(corrArray,[],3). Python users:

please look up numpy function ‘amax’.

(e) [0.25 point] By this time, you know at every pixel which template gave the

maximum correlation. But it still might not be a digit, because obviously every

pixel does not represent a digit. Find the pixels for which maxCorr exceeds a

threshold T (use a threshold between 0 and 1, more on this later ).

(f) These pixels are candidate digit locations. But still there would be multiple

locations around the digit that exceed the threshold. So reject those pixels which

are not a local maximum in terms of correlation. Specifically,

i. [0.25 point] Let’s call the coordinates of candidate pixels (output of Step-e)

as candX, candY. Loop through candX and candY. For every (candX, candY),

check which template produced maximum correlation. This information can

be retrieved from maxIdx(candY(i), candX(i)), where i is i-th candidate pixel.

Note the order of x and y is reverse. Now you have the templateIndex for the

template that gave the maximum correlation for the candidate location.

ii. [0.25 point] Extract the correlation matrix for the above templateIndex i.e.

thisCorr = corrArray(:,:, templateIndex)

iii. [2 points] Then within ’thisCorr’ matrix check if candX(i) and candY(i) is

a local maximum in a 3 x 3 window. i.e. write a function isLocalMaximum(

x, y, thisCorr), which checks if thisCorr(y,x) is equal to the maximum value

in the window centered around x,y.

iv. [0.25 point] If candX(i), candY(i) pass the above test, draw a bounding

box around x,y equal to the size of the template corresponding to template

index. Code for this is provided as drawAndLabelBox( x,y,templateIndex ,

dimensions ). Because you would be calling this function in a loop, it is a

good practice to call the function ’drawnow’ to refresh the display after every

iteration, otherwise one gets an impression that the screen is frozen!

v. [1 point] Try out various values for the threshold T. You won’t get perfect

result, but your goal should be to set T such that (i) no non-digit is detected

as a digit, (ii) No digit is labeled incorrectly, (iii) it is okay to miss a digit.

Report your threshold, and a screenshot of detected bounding boxes. Report

the number of correctly labeled digits.

vi. [Extra Credit 3 points]. Getting 1 of these 3 points is easy. Read

carefully Can you improve the digit detection by using templates from the

3

input image. i.e. try to crop your own templates for some of the digits

from the image, and then re-run the code with new templates. Why is this

expected to improve the performance? You will get 1 point if you correctly

answer (in a sentence or two) why the performance is expected to improve,

even if you don’t implement it!

4