Description
Part A: Object Recognition
The project uses a sample of the CIFAR-10 dataset:
https://www.cs.toronto.edu/~kriz/cifar.html
The dataset contains RGB colour images of size 32 X 32 and their labels from 0 to 9. You will be
using the batch-1 of the dataset, which contains 10,000 training samples. Testing is done on 2,000
test samples. The training data and testing data are provided in files ‘data_batch_1’ and
‘test_batch_trim’ files, respectively. Sample code is given in file start_project_2a.py
Design a convolutional neural network consisting of:
• An Input layer of 3x32x32 dimensions
• A convolution layer 𝐶1 of 50 filters of window size 9×9, VALID padding, and ReLU neurons. A
max pooling layer 𝑆1 with a pooling window of size 2×2, with stride = 2 and padding = ‘VALID’.
• A convolution layer 𝐶2 of 60 filters of window size 5×5, VALID padding, and ReLU neurons. A
max pooling layer 𝑆2with a pooling window of size 2×2, with stride = 2 and padding = ‘VALID’.
• A fully connected layer 𝐹3 of size 300.
• A softmax layer 𝐹4of size 10.
1. Train the network by using mini-batch gradient descent learning. Set batch size =128, and
learning rate 𝛼 = 0.001. Images should be scaled.
a. Plot the training cost and the test accuracy against learning epochs.
b. For any two test patterns, plot the feature maps at both convolution layers (𝐶1 and 𝐶2
)
and pooling layers (𝑆1 and 𝑆2
) along with the test patterns.
(18 marks)
2. Using a grid search, find the optimal numbers of feature maps for part (1) at the convolution
layers. Use the test accuracy to determine the optimal number of feature maps.
(10 marks)
3. Using the optimal number of filters found in part (2), train the network by:
a. Adding the momentum term with momentum 𝛾 = 0.1.
b. Using RMSProp algorithm for learning
c. Using Adam optimizer for learning
d. Adding dropout to the layers
Plot the training costs and test accuracies against epochs for each case.
(12 marks)
4. Compare the accuracies of all the models from parts (1) – (3) and discuss their performances.
(10 marks)
Part B: Text classification
The dataset used in this project contains the first paragraphs collected from Wikipage entries and
the corresponding labels about their category. You will implement CNN and RNN layers at the
word and character levels for the classification of texts in the paragraphs. The output layer of the
networks is a softmax layer.
The training and test datasets will be read from ‘train_medium.csv’ and ‘test_medium.csv’ files.
The training dataset contains 5600 entries and test dataset contains 700 entries. The label of an
entry is one of the 15 categories such as people, company, schools, etc.
The input data is in text, which should be converted to character/word IDs to feed to the
networks by using ‘tf.contrib.learn.preprocessing’. Restrict the maximum length of the
characters/word inputs to 100. Use the Adam optimizer for training with a batch size = 128 and
learning rate = 0.01. Assume there are 256 different characters.
1. Design a Character CNN Classifier that receives character ids and classifies the input. The
CNN has two convolution and pooling layers:
• A convolution layer 𝐶1
of 10 filters of window size 20×256, VALID padding, and ReLU
neurons. A max pooling layer 𝑆1 with a pooling window of size 4×4, with stride = 2, and
padding = ‘SAME’.
• A convolution layer 𝐶2 of 10 filters of window size 20×1, VALID padding, and ReLU
neurons. A max pooling layer 𝑆2with a pooling window of size 4×4, with stride = 2 and
padding = ‘SAME’.
Plot the entropy cost on the training data and the accuracy on the testing data against training
epochs.
(9 marks)
2. Design a Word CNN Classifier that receives word ids and classifies the input. Pass the inputs
through an embedding layer of size 20 before feeding to the CNN. The CNN has two
convolution and pooling layers with the following characteristics:
• A convolution layer 𝐶1
of 10 filters of window size 20×20, VALID padding, and ReLU
neurons. A max pooling layer 𝑆1 with a pooling window of size 4×4, with stride = 2 and
padding = ‘SAME’.
• A convolution layer 𝐶2 of 10 filters of window size 20×1, , VALID padding, and ReLU
neurons. A max pooling layer 𝑆2with a pooling window of size 4×4, with stride = 2 and
padding = ‘SAME’.
Plot the entropy cost on training data and the accuracy on testing data against training
epochs.
(9 marks)
3. Design a Character RNN Classifier that receives character ids and classify the input. The RNN
is GRU layer and has a hidden-layer size of 20.
Plot the entropy cost on training data and the accuracy on testing data against training
epochs.
(9 marks)
4. Design a word RNN classifier that receives word ids and classify the input. The RNN is GRU
layer and has a hidden-layer size of 20. Pass the inputs through an embedding layer of size
20 before feeding to the RNN.
Plot the entropy on training data and the accuracy on testing data versus training epochs.
(9 marks)
5. Compare the test accuracies and the running times of the networks implemented in parts (1)
– (4).
Experiment with adding dropout to the layers of networks in parts (1) – (4), and report the
test accuracies. Compare and comment on the accuracies of the networks with/without
dropout.
(8 marks)
6. For RNN networks implemented in (3) and (4), perform the following experiments with the
aim of improving performances, compare the accuracies and report your findings:
a. Replace the GRU layer with (i) a vanilla RNN layer and (ii) a LSTM layer
b. Increase the number of RNN layers to 2 layers
c. Add gradient clipping to RNN training with clipping threshold = 2.
(8 marks)
Hint: Sample code is given in file start_project_2b1.py and start_project_2b2.py