## Description

1. (10 points) Consider a non-convolutional neural network that starts from an RGB input

image of size N × N. It has L layers with h nodes per hidden layer, and n nodes in the

output layer. Suppose each layer is fully-connected to the previous layer and there is a

separate bias term at each node of at each layer. Derive a formula to describe the number of

learnable parameters in the network.

2. (10 points) Consider a convolutional neural network applied to an RGB input image of size

N × N where, for simplicity of analysis, N is a power of 2. Suppose that

• the convolutions each cover k × k pixels,

• there are d different convolutions per convolution layer,

• padding is used so that convolutions do not results in image shrinkage, and

• after each convolution layer there is a max pooling layer applied over non-overlapping

2 × 2 pixel regions.

Suppose also that there L convolution layers, followed by F fully-connected layers with h

nodes per layer, and n

(o) nodes in the output layer. Derive an expression for the number

of learnable convolution parameters and the number of learnable parameters in the fullyconnected and output layers.

3. (20 points) So far, in the output layer of our networks we have used the same activation

function as we did in the hidden layers and then applied a mean square error loss (cost)

function to evaluate the output. More commonly, however, the output layer uses a special

activation function called the softmax that forces the output to be a probability distribution

(non-negative values that sum to 1), and this is combined with the cross entropy loss function

to generate the error signals that start backpropagation. To be specific, let

z

(L)

j =

nX

(L−1)

k=1

w

(L)

jk a

(L−1)

k + b

(L)

j

be the input to the node j at the last layer, layer (L). Notationally, we drop the (L) superscript

and just write zj . Also, we use the notation pj as the output from node j in the output

1

layer instead of a

(L)

j

, both to simplify and to indicate that the output is, mathematically, a

probability. Using these two notational changes, the softmax activation output is defined as

pj =

e

zj

Pn

k=1 e

zk

.

(where we’ve adopted the short-hand n for n

(L)) and the cross-entropy loss function for

expected binary output vector y is

L(p, y) = −

X

i

yi

log pi

.

(a) Show that the activations across the output layer truly form a probability distribution.

(b) Show that the derivative of pi with respect to zk is

∂pi

∂zk

= pi(δik − pk).

where δik is the Kronecker delta function.

(c) Use this to show that the derivative of L with respect to the input at the i node at the

output layer has the amazingly simple, elegant form

∂L

∂zi

= pi − yi

4. (60 points) We return to the HW 5 problem of developing a classifier to determine the

dominant background class of the scene in an image, but this time using neural networks.

Specifically, you will use pytorch to implement two neural networks, one using only fullyconnected layers, and the other using convolutional layers in addition to fully-connected

layers. The networks will each start directly with the input images so you will not need to

write or use any code to do manual preprocessing or descriptor formation. Therefore, once

you understand how to use pytorch the code you actually write will in fact be quite short.

Your output should be the same as the output from HW 5.

Make sure that your write-up includes a discussion of the design of your networks and why

you made those choices

To help you get started I’ve provided a Jupyter notebook (on the Piazza site) that illustrates

some of the main concepts of pytorch, starting with Tensors and Variables and proceeding to networks, loss functions, and optimizers. This also includes pointers to tutorials on

pytorch.org. Class on November 27 covers this notebook. If you already know TensorFlow,

you’ll find pytorch quite straightforward.

Two side notes:

(a) PyTorch includes some excellent tools for uploading and transforming images into Tensor

objects. For this assignment, you will not need to use these since you’ve already written

code for image input for HW 5 that gathers images into numpy arrays and it is trivial

to transform these objects into pytorch tensors.

(b) Because of the size of the images, you might find it quite computationally expensive

and tedious to use a fully-connected network on the full-sized images, especially if you

don’t have a usable GPU on your computer. Therefore, you are welcome to resize the

images before input to your first network. In fact, we strongly suggest that you start

with significantlly down-ssized images first and then see how far you can scale up!

2