Homework 1 – Machine Learning CS4342 solved


Category: You will receive a download link of the .ZIP file upon Payment


5/5 - (1 vote)

1 Part 1: Python and numpy
For each of the problems below, write a method (e.g., problem1) that returns the answer for the corresponding
problem. Put all your methods in one file called homework1 WPIUSERNAME.py
(e.g., homework1 jrwhitehill.py, or homework1 jrwhitehill lleshin.py for a team). See the starter file
homework1 template.py. In all problems, you may assume that the the dimensions of the matrices and/or
vectors are compatible for the requested mathematical operations.
Note 1: In mathematical notation we usually start indices with j = 1. However, in numpy (and many
other programming settings), it is more natural to use 0-based array indexing. When answering the questions
below, do not worry about “translating” from 1-based to 0-based indexes. For example, if the (i, j)th element
of some matrix is requested, (where i, j ≥), you can simply write A[i,j].
Note 2: To represent and manipulate vectors and matrices, please use numpy’s array class (not the
matrix class).
Note 3: While the difference between a row vector and a column vector is important when doing math,
numpy does not care about this difference. Hence, please use a 1-dimensional numpy array for both cases.
1. Given matrices A and B, compute and return an expression for A + B. [ 2 pts ]
Answer (freebie!): While it is completely valid to use np.add(A, B), this is unnecessarily verbose; you
really should make use of the “syntactic sugar” provided by Python’s/numpy’s operator overloading and
just write: A + B. Similarly, you should use the more compact (and arguably more elegant) notation
for the rest of the questions as well.
2. Given matrices A, B, and C, compute and return AB − C (i.e., right-multiply matrix A by matrix
B, and then subtract C). Use dot or np.dot. [ 2 pts ]
3. Given matrices A, B, and C, return A B + C>, where represents the element-wise (Hadamard)
product and > represents matrix transpose. In numpy, the element-wise product is obtained simply
with *. [ 2 pts ]
4. Given column vectors x and y, compute the inner product of x and y (i.e., x
>y). [ 2 pts ]
5. Given matrix A, return a matrix with the same dimensions as A but that contains all zeros. Use
np.zeros. [ 2 pts ]
6. Given matrix A, return a vector with the same number of rows as A but that contains all ones. Use
np.ones. [ 2 pts ]
7. Given square matrix A and (scalar) α, compute A + αI, where I is the identity matrix with the same
dimensions as A. Use np.eye. [ 2 pts ]
8. Given matrix A and integers i, j, return the jth column of the ith row of A, i.e., Aij . [ 2 pts ]
9. Given matrix A and integer i, return the sum of all the entries in the ith row, i.e., P
j Aij . Do not
use a loop, which in Python is very slow. Instead use the np.sum function. [ 4 pts ]
10. Given matrix A and scalars c, d, compute the arithmetic mean over all entries of A that are between
c and d (inclusive). In other words, if S = {(i, j) : c ≤ Aij ≤ d}, then compute 1
(i,j)∈S Aij . Use
np.nonzero along with np.mean. [ 5 pts ]
11. Given an (n × n) matrix A and integer k, return an (n × k) matrix containing the right-eigenvectors
of A corresponding to the k largest eigenvalues of A. Use np.linalg.eig to compute eigenvectors. [
5 pts ]
12. Given square matrix A and column vector x, use np.linalg.solve to compute A−1x. Do not use
np.linalg.inv or ** -1 to compute the inverse explicitly; this is numerically unstable and can, in
some situations, give incorrect results. [ 5 pts ]
13. Given square matrix A and row vector x, use np.linalg.solve to compute xA−1
. Hint: AB =
>. Do not use np.linalg.inv or ** -1 to compute the inverse explicitly. [ 5 pts ]
2 Part 2: Step-wise Classification
For the tasks below, write your code in a file called homework1 smile WPIUSERNAME.py, and put your experimental results in homework1 smile WPIUSERNAME.pdf.
In this part of the assignment you will train a very simple smile classifier that analyzes a grayscale image
x ∈ R
24×24 and outputs a prediction ˆy ∈ {0, 1} indicating whether the image is smiling (1) or not (0).
The classifier will make its decision based on very simple features of the input image consisting of binary
comparisons between pixel values. Each feature is computed as
I[xr1,c1 > xr2,c2
where ri
, ci ∈ {0, 1, 2, . . . , 23} are the row and column indices, respectively, and I[·] is an indicator function
whose value is 1 if the condition is true and 0 otherwise. In general, these features are not very good, but
nonetheless they will enable the classifier to achieve an accuracy (fPC) much better than just guessing or
just choosing the dominant class. Based on these features, you should train an ensemble smile classifier
using step-wise classification for m = 5 features. The output of the ensemble (1 if it thinks the image is
smiling, 0 otherwise) is determined by the average prediction across all m members of the ensemble. If more
than half of the m ensemble predictors g
(1), . . . , g(m)
think that the image is smiling, then the ensemble says
it’s a smile; otherwise, the ensemble says it’s not smiling.
Step-wise classification/regression is a greedy algorithm: at each round j, choose the jth feature
(r1, c1, r2, c2) such that – when it is added to the set of j − 1 features that have already been selected – the
accuracy (fPC) of the overall classifier on the training set is maximized. More specifically, at every round j,
consider every possible tuple of pixel locations (r1, c1, r2, c2): if you construct an ensemble with j predictors
(the j −1 you’ve already chosen, plus the current “candidate” (r1, c1, r2, c2)), is the resulting ensemble more
accurate (in terms of PC on training data) than any other tuple of pixel locations during this round? If the
current tuple is the best yet (for round j), then save it as your “best seen yet” for round j. If not, ignore it.
Move on to the next possible tuple of pixel locations, and repeat until you’ve searched all of them. At the
end of round j, you will have selected the best feature for that round, and you add it to your set of selected
features. Once added, it stays in the set forever – it can never be removed. (Otherwise, it wouldn’t be a
greedy algorithm at all.) Now you move on to round j + 1 until you’ve completed m = 5 rounds.
To measure the ensemble’s accuracy (fPC), you should run it on all the images in the training set, and
then compare the output of the ensemble to the corresponding ground-truth labels. At the end of the entire
training procedure, you should estimate how well your “machine” (ensemble smile classifier) works on a set
of images not used for training, i.e., the test set.
Skeleton code: While how you write your code is up to you (subject to the vectorization constraint
and also basic readability); however, to get you started, we sketched in a few functions:
• fPC (y, yhat): this takes in a vector of ground-truth labels and corresponding vector of guesses, and
then computes the accuracy (PC). The implementation (in vectorized form) should only take 1-line.
• measureAccuracyOfPredictors (predictors, X, y): this takes in a set of predictors, a set of images
to run it on, as well as the ground-truth labels of that set. For each image in the image set, it runs the
ensemble to obtain a prediction. Then, it computes and returns the accuracy (PC) of the predictions
w.r.t. the ground-truth labels.
• stepwiseRegression (trainingFaces, trainingLabels, testingFaces, testingLabels): I’ve included some visualization code, but otherwise it’s empty. You need to implement the step-wise classification described above.
1. Download from Canvas the starter Python file homework1 smile.py as well as the following data files:
2. Write code to train a step-wise classifier for m = 5 features of the binary comparison type described
above; the greedy procedure should maximize fPC (which you will also need to implement). At each
round, the code should examine every possible feature (r1, c1, r2, c2).. Make sure your code is vectorized to improve run-time performance (wall-clock time) [ 25 pts ].
3. Write code to analyze how training/testing accuracy changes as a function of number of examples
n ∈ {400, 800, 1200, 1600, 2000} (implement this in a for-loop)
(a) Run your step-wise classifier training code only on the first n examples of the training set.
(b) Measure and record the training accuracy of the trained classifier on the n examples.
(c) Measure and record the testing accuracy of the classifier on the (entire) test set.
Important: you must write code (a simple loop) to do this – do not just do it manually for each value
of n. This is good experimental practice in general and is especially important in machine learning to
ensure reproducibility of results. Using the entire training set, you should achieve a test accuracy of
at least 75%. [ 8 pts ].
4. In a PDF document (you can use whatever tool you like – LaTeX, Word, Google Docs, etc. – but make
sure you export to PDF), show the training accuracy and testing accuracy as a function of n. Please
use the following simple format:
n trainingAccuracy testingAccuracy
400 … …
800 … …
1200 … …
1600 … …
2000 … …
Moreover, characterize in words how the training accuracy and testing accuracy changes as a function
of n, and how the two curves relate to each other. What trends do you observe? [4 pts]
5. Visualizing the learned features: It is very important in empirical machine learning work to
visualize what was actually learned during training. This can be useful for debugging to make sure
that your training code is working as it should, and also to make sure your training and testing sets
are selected wisely. For n = 2000, visualize the m = 5 features that were learned by (a) displaying any
face image from the test set; and (b) drawing a square around the specific pixel locations ((r1, c1) and
(r2, c2)) that are examined by the feature. You can use the example code in the homework1 smile.py
template to render the image. Insert the graphic (just one showing all 5 features) into your PDF file.
[ 3 pts].
Here’s an example that shows just one feature:
Tip on vectorization: Implement your training algorithm so that, for any particular feature (r1, c1, r2, c2),
all the feature values (over all the n training images) are extracted at once using numpy – do not use a loop.
Even after vectorizing my own code, running the experiments required in this assignment took about 1 hour
(on a single CPU of a MacBook 2016 laptop).