Introduction to ML Assignment 4 solved




Part 1. Building a CNN. The starter code provides you with a data loading procedure
and a base convolutional network model (Base Model). Study the code carefully and answer
the following questions.
1. (1 point) Your data has been split into training, validation and test set. Examine the
ratio of the split and number of examples in each set. Suppose you were to train on
batches of 32 examples each. That is, in each step of gradient descent, you randomly
select 32 examples from the training set, compute your average loss on these examples,
and then compute the gradient of this average loss with respect to the model parameters.
How many iterations will it take to go through the entire training set given the number
of training examples yielded by the data split? How many iterations are there in 30
epochs? Recall that one epoch is the number of iterations needed to train over the
entire dataset.
2. (2 points) Fill in the code for your custom convolution filter and show that it returns
the same output as Objax’s own convolution routine.
3. (1 point) Fill in the code for your linear layer, and show that it returns the same output
as passing through Objax’s own linear layer.
4. (1 point) Explain in a short paragraph what is the difference between the training and
validation set.
Part 2. Training and Tuning a CNN. For this part, you have been given a starter
code. Navigate to the portion where you define your optimizer.
1. (1 point) Complete the optimizer by using the definition of (stochastic) gradient descent:
wk+1 = wk−�∇L(wk). Note that you need to update params.value, which are the values
of the trainable variables of your model.
2. (1 point) Complete the batch sampling code in the train function by specifying a batch
of examples. You should make use the lists train indices and val indices.
Fall 2022 – Introduction to ML Assignment 4 – Page 3 of 5 Nov 24
3. (1 point) Train the model for a few epochs, and observe the training/validation loss and
training/validation accuracy plots. Include these plots within the PDF you hand in.
You should observe that the validation accuracy is low and stagnates after a few epochs.
Next we will go through a rudimentary way of adjusting the hyperparameters of the
model which we created.
4. (1 point) In one sentence, define the meaning of a “hyperparameter”. Explain in a
short paragraph why it is important not to evaluate the accuracy on the test set until
all hyperparameters have been tuned.
5. (2 points) Select 4 hyperparameters associated with your network, one of the hyperparameter must involve your CNN architecture, and come up with two different sets of
For example, suppose my set of hyperparameters are defined as (yours might be different)
H = {batch size, learning rate, number of outputs of conv layer 1, number of conv layers},
where conv layer is defined as a composition between filter, activation and pooling, then
two sets of hyperparameters may be,
H1 = {32, 0.001, 16, 2} H2 = {64, 0.0001, 32, 3}
The hyperparameters that you tune does not need to be specified as a numerical value.
For instance, you can specify the optimizer you are using to train the network, or the activation function you are using. You may wish to consult: https://objax.readthedocs.
6. (3 points) Create two additional networks M1, M2, each with the set of hyperparameter
H1, H2 that you have selected above. Train each model. Report the best validation
accuracy as well as the corresponding epoch for which this occurs for the Base Model
and your two additional models. For example,
Base model: 20% at epoch 22 M1: 30% at epoch 18 M2: 50% at epoch 24
Which model performs the best in terms of validation accuracy? These new models
do not need to outperform the base model, however, if you are unsatisfied with the
validation accuracy, adjust your hyperparameters in the previous part and train until
you are satisfied. Don’t forget to report your hyperparameters.
7. (2 points) Based on your answer, which model should you pick as your final model and
why? Then evaluate your model on the test set and report final test accuracy.
Part 3. Trying Out a New Dataset. To solidify your knowledge of training and tuning
of a CNN, you will run another set of experiments on a new dataset. It should closely follow
what you have done in Part 2. You will develop your own network to classify the data
included in the dataset you picked. You may re-use any part of the starter code (e.g., for
Fall 2022 – Introduction to ML Assignment 4 – Page 4 of 5 Nov 24
Pick a dataset of your own choosing from the following link (under “Image classification”):
1. (1 point) Import and partition your data.
2. (1 point) Create a base model to start out with
3. (3 points) Pick several hyperparameters you would like to tune and train a model until
its validation accuracy is 5-10% better than the base model. Provide a succinct discussion
on your design procedure: which hyperparameters you tuned, what is the new validation
4. (1 point) Select your final model and report test accuracy.
Part 4. Open-Ended Exploration. In this final part, you will explore one of several
questions introducing ways to improve your classification performance through tuning. You
may build your experiment on top of either Part 2 or Part 3. Pick ONE of the questions
below, perform a small set of NEW experiments that address that question, and provide a
1. Additional hyperparameter tuning Come up with one or several hyperparameters
that you have not tried in the previous parts (e.g., explore arguments of the method
objax.nn.Cov2d), tune them on the validation set until you see (at-least) 5%-10%
increase in the validation accuracy as compared to a base model. Discuss whether it
performs well on the test set as compared to your base model.
2. How do hyperparameters interact? Can you demonstrate for at-least two hyperparameters, where each independently increases the performance, but does not increase
the performance or produce worse performance when used together? Alternatively,
can you demonstrate for at-least two hyperparameters, where each independently increases the performance, and also increase the performance when used together? You
may interpret “performance” as performance on the validation set. Report the final
performance on the test set.
3. How do optimizers compare? Try out at least two optimizers (other than the SGD
routine you have implemented in Part 2), clearly state their optimization routine and
the rationale behind their implementations, and all hyperparameters associated with
each optimizer. Implement them in Objax, then compare their performances on the
validation set after some tuning. (You may find objax.optimizer useful) Report the
final performance on the test set.
4. Is it better just to grab an off-of-the-shelf model? Import at-least one pretrained model, explain its architecture, adapt the model(s) to your application and
compare the performance of the model(s) to your own CNN model. (You may find
objax.zoo useful. Some well-known models include VGG19 and Wide Residual Network.) Report the final performance on the test set.
Fall 2022 – Introduction to ML Assignment 4 – Page 5 of 5 Nov 24
Your grade will be allocated on the quality of the discussion, which involves the following
1. (4 points) A succinct discussion of your problem that involve the following: What was
the question? Which hyperparameter(s) did you tune? How many models did you try
and what are their performances in terms of validation accuracy? How did you choose
your final model?
Your short discussion should be written in a way that can be understood by a fellow
classmate. This means you should define technical terms used in your discussion when
relevant, and possibly introduce equations to explain concepts clearly. You do not need
to tune hyperparameter(s) extensively and only need to include the results associated
with your final tuned value. You may wish to include table and figures to support your
2. (1 point) Select your final model and report its test accuracy.
Make sure to clearly indicate which of the questions above you are trying to answer. You
are only expected to answer one of the questions, but multiple experiments are welcome!

∗ ∗