## Description

1 Part 1: Python and numpy

For each of the problems below, write a method (e.g., problem1) that returns the answer for the corresponding

problem. Put all your methods in one file called homework1 WPIUSERNAME.py

(e.g., homework1 jrwhitehill.py, or homework1 jrwhitehill lleshin.py for a team). See the starter file

homework1 template.py. In all problems, you may assume that the the dimensions of the matrices and/or

vectors are compatible for the requested mathematical operations.

Note 1: In mathematical notation we usually start indices with j = 1. However, in numpy (and many

other programming settings), it is more natural to use 0-based array indexing. When answering the questions

below, do not worry about “translating” from 1-based to 0-based indexes. For example, if the (i, j)th element

of some matrix is requested, (where i, j ≥), you can simply write A[i,j].

Note 2: To represent and manipulate vectors and matrices, please use numpy’s array class (not the

matrix class).

Note 3: While the difference between a row vector and a column vector is important when doing math,

numpy does not care about this difference. Hence, please use a 1-dimensional numpy array for both cases.

1. Given matrices A and B, compute and return an expression for A + B. [ 2 pts ]

Answer (freebie!): While it is completely valid to use np.add(A, B), this is unnecessarily verbose; you

really should make use of the “syntactic sugar” provided by Python’s/numpy’s operator overloading and

just write: A + B. Similarly, you should use the more compact (and arguably more elegant) notation

for the rest of the questions as well.

2. Given matrices A, B, and C, compute and return AB − C (i.e., right-multiply matrix A by matrix

B, and then subtract C). Use dot or np.dot. [ 2 pts ]

3. Given matrices A, B, and C, return A B + C>, where represents the element-wise (Hadamard)

product and > represents matrix transpose. In numpy, the element-wise product is obtained simply

with *. [ 2 pts ]

4. Given column vectors x and y, compute the inner product of x and y (i.e., x

>y). [ 2 pts ]

5. Given matrix A, return a matrix with the same dimensions as A but that contains all zeros. Use

np.zeros. [ 2 pts ]

6. Given matrix A, return a vector with the same number of rows as A but that contains all ones. Use

np.ones. [ 2 pts ]

7. Given square matrix A and (scalar) α, compute A + αI, where I is the identity matrix with the same

dimensions as A. Use np.eye. [ 2 pts ]

8. Given matrix A and integers i, j, return the jth column of the ith row of A, i.e., Aij . [ 2 pts ]

9. Given matrix A and integer i, return the sum of all the entries in the ith row, i.e., P

j Aij . Do not

use a loop, which in Python is very slow. Instead use the np.sum function. [ 4 pts ]

10. Given matrix A and scalars c, d, compute the arithmetic mean over all entries of A that are between

c and d (inclusive). In other words, if S = {(i, j) : c ≤ Aij ≤ d}, then compute 1

|S|

P

(i,j)∈S Aij . Use

np.nonzero along with np.mean. [ 5 pts ]

1

11. Given an (n × n) matrix A and integer k, return an (n × k) matrix containing the right-eigenvectors

of A corresponding to the k largest eigenvalues of A. Use np.linalg.eig to compute eigenvectors. [

5 pts ]

12. Given square matrix A and column vector x, use np.linalg.solve to compute A−1x. Do not use

np.linalg.inv or ** -1 to compute the inverse explicitly; this is numerically unstable and can, in

some situations, give incorrect results. [ 5 pts ]

13. Given square matrix A and row vector x, use np.linalg.solve to compute xA−1

. Hint: AB =

(B>A>)

>. Do not use np.linalg.inv or ** -1 to compute the inverse explicitly. [ 5 pts ]

2 Part 2: Step-wise Classification

For the tasks below, write your code in a file called homework1 smile WPIUSERNAME.py, and put your experimental results in homework1 smile WPIUSERNAME.pdf.

In this part of the assignment you will train a very simple smile classifier that analyzes a grayscale image

x ∈ R

24×24 and outputs a prediction ˆy ∈ {0, 1} indicating whether the image is smiling (1) or not (0).

The classifier will make its decision based on very simple features of the input image consisting of binary

comparisons between pixel values. Each feature is computed as

I[xr1,c1 > xr2,c2

]

where ri

, ci ∈ {0, 1, 2, . . . , 23} are the row and column indices, respectively, and I[·] is an indicator function

whose value is 1 if the condition is true and 0 otherwise. In general, these features are not very good, but

nonetheless they will enable the classifier to achieve an accuracy (fPC) much better than just guessing or

just choosing the dominant class. Based on these features, you should train an ensemble smile classifier

using step-wise classification for m = 5 features. The output of the ensemble (1 if it thinks the image is

smiling, 0 otherwise) is determined by the average prediction across all m members of the ensemble. If more

than half of the m ensemble predictors g

(1), . . . , g(m)

think that the image is smiling, then the ensemble says

it’s a smile; otherwise, the ensemble says it’s not smiling.

Step-wise classification/regression is a greedy algorithm: at each round j, choose the jth feature

(r1, c1, r2, c2) such that – when it is added to the set of j − 1 features that have already been selected – the

accuracy (fPC) of the overall classifier on the training set is maximized. More specifically, at every round j,

consider every possible tuple of pixel locations (r1, c1, r2, c2): if you construct an ensemble with j predictors

(the j −1 you’ve already chosen, plus the current “candidate” (r1, c1, r2, c2)), is the resulting ensemble more

accurate (in terms of PC on training data) than any other tuple of pixel locations during this round? If the

current tuple is the best yet (for round j), then save it as your “best seen yet” for round j. If not, ignore it.

Move on to the next possible tuple of pixel locations, and repeat until you’ve searched all of them. At the

end of round j, you will have selected the best feature for that round, and you add it to your set of selected

features. Once added, it stays in the set forever – it can never be removed. (Otherwise, it wouldn’t be a

greedy algorithm at all.) Now you move on to round j + 1 until you’ve completed m = 5 rounds.

To measure the ensemble’s accuracy (fPC), you should run it on all the images in the training set, and

then compare the output of the ensemble to the corresponding ground-truth labels. At the end of the entire

training procedure, you should estimate how well your “machine” (ensemble smile classifier) works on a set

of images not used for training, i.e., the test set.

Skeleton code: While how you write your code is up to you (subject to the vectorization constraint

and also basic readability); however, to get you started, we sketched in a few functions:

• fPC (y, yhat): this takes in a vector of ground-truth labels and corresponding vector of guesses, and

then computes the accuracy (PC). The implementation (in vectorized form) should only take 1-line.

• measureAccuracyOfPredictors (predictors, X, y): this takes in a set of predictors, a set of images

to run it on, as well as the ground-truth labels of that set. For each image in the image set, it runs the

2

ensemble to obtain a prediction. Then, it computes and returns the accuracy (PC) of the predictions

w.r.t. the ground-truth labels.

• stepwiseRegression (trainingFaces, trainingLabels, testingFaces, testingLabels): I’ve included some visualization code, but otherwise it’s empty. You need to implement the step-wise classification described above.

Tasks:

1. Download from Canvas the starter Python file homework1 smile.py as well as the following data files:

trainingFaces.npy,trainingLabels.npy,testingFaces.npy,testingLabels.npy.

2. Write code to train a step-wise classifier for m = 5 features of the binary comparison type described

above; the greedy procedure should maximize fPC (which you will also need to implement). At each

round, the code should examine every possible feature (r1, c1, r2, c2).. Make sure your code is vectorized to improve run-time performance (wall-clock time) [ 25 pts ].

3. Write code to analyze how training/testing accuracy changes as a function of number of examples

n ∈ {400, 800, 1200, 1600, 2000} (implement this in a for-loop)

(a) Run your step-wise classifier training code only on the first n examples of the training set.

(b) Measure and record the training accuracy of the trained classifier on the n examples.

(c) Measure and record the testing accuracy of the classifier on the (entire) test set.

Important: you must write code (a simple loop) to do this – do not just do it manually for each value

of n. This is good experimental practice in general and is especially important in machine learning to

ensure reproducibility of results. Using the entire training set, you should achieve a test accuracy of

at least 75%. [ 8 pts ].

4. In a PDF document (you can use whatever tool you like – LaTeX, Word, Google Docs, etc. – but make

sure you export to PDF), show the training accuracy and testing accuracy as a function of n. Please

use the following simple format:

n trainingAccuracy testingAccuracy

400 … …

800 … …

1200 … …

1600 … …

2000 … …

Moreover, characterize in words how the training accuracy and testing accuracy changes as a function

of n, and how the two curves relate to each other. What trends do you observe? [4 pts]

5. Visualizing the learned features: It is very important in empirical machine learning work to

visualize what was actually learned during training. This can be useful for debugging to make sure

that your training code is working as it should, and also to make sure your training and testing sets

are selected wisely. For n = 2000, visualize the m = 5 features that were learned by (a) displaying any

face image from the test set; and (b) drawing a square around the specific pixel locations ((r1, c1) and

(r2, c2)) that are examined by the feature. You can use the example code in the homework1 smile.py

template to render the image. Insert the graphic (just one showing all 5 features) into your PDF file.

[ 3 pts].

3

Here’s an example that shows just one feature:

Tip on vectorization: Implement your training algorithm so that, for any particular feature (r1, c1, r2, c2),

all the feature values (over all the n training images) are extracted at once using numpy – do not use a loop.

Even after vectorizing my own code, running the experiments required in this assignment took about 1 hour

(on a single CPU of a MacBook 2016 laptop).

4