CSci 4270 and 6270 Computational Vision, Homework 6 solved

\$35.00

Description

5/5 - (1 vote)

1. (10 points) Consider a non-convolutional neural network that starts from an RGB input
image of size N × N. It has L layers with h nodes per hidden layer, and n nodes in the
output layer. Suppose each layer is fully-connected to the previous layer and there is a
separate bias term at each node of at each layer. Derive a formula to describe the number of
learnable parameters in the network.
2. (10 points) Consider a convolutional neural network applied to an RGB input image of size
N × N where, for simplicity of analysis, N is a power of 2. Suppose that
• the convolutions each cover k × k pixels,
• there are d different convolutions per convolution layer,
• padding is used so that convolutions do not results in image shrinkage, and
• after each convolution layer there is a max pooling layer applied over non-overlapping
2 × 2 pixel regions.
Suppose also that there L convolution layers, followed by F fully-connected layers with h
nodes per layer, and n
(o) nodes in the output layer. Derive an expression for the number
of learnable convolution parameters and the number of learnable parameters in the fullyconnected and output layers.
3. (20 points) So far, in the output layer of our networks we have used the same activation
function as we did in the hidden layers and then applied a mean square error loss (cost)
function to evaluate the output. More commonly, however, the output layer uses a special
activation function called the softmax that forces the output to be a probability distribution
(non-negative values that sum to 1), and this is combined with the cross entropy loss function
to generate the error signals that start backpropagation. To be specific, let
z
(L)
j =
nX
(L−1)
k=1
w
(L)
jk a
(L−1)
k + b
(L)
j
be the input to the node j at the last layer, layer (L). Notationally, we drop the (L) superscript
and just write zj . Also, we use the notation pj as the output from node j in the output
1
(L)
j
, both to simplify and to indicate that the output is, mathematically, a
probability. Using these two notational changes, the softmax activation output is defined as
pj =
e
zj
Pn
k=1 e
zk
.
(where we’ve adopted the short-hand n for n
(L)) and the cross-entropy loss function for
expected binary output vector y is
L(p, y) = −
X
i
yi
log pi
.
(a) Show that the activations across the output layer truly form a probability distribution.
(b) Show that the derivative of pi with respect to zk is
∂pi
∂zk
= pi(δik − pk).
where δik is the Kronecker delta function.
(c) Use this to show that the derivative of L with respect to the input at the i node at the
output layer has the amazingly simple, elegant form
∂L
∂zi
= pi − yi
4. (60 points) We return to the HW 5 problem of developing a classifier to determine the
dominant background class of the scene in an image, but this time using neural networks.
Specifically, you will use pytorch to implement two neural networks, one using only fullyconnected layers, and the other using convolutional layers in addition to fully-connected
layers. The networks will each start directly with the input images so you will not need to
write or use any code to do manual preprocessing or descriptor formation. Therefore, once
you understand how to use pytorch the code you actually write will in fact be quite short.
Your output should be the same as the output from HW 5.
Make sure that your write-up includes a discussion of the design of your networks and why
To help you get started I’ve provided a Jupyter notebook (on the Piazza site) that illustrates
some of the main concepts of pytorch, starting with Tensors and Variables and proceeding to networks, loss functions, and optimizers. This also includes pointers to tutorials on
pytorch.org. Class on November 27 covers this notebook. If you already know TensorFlow,
you’ll find pytorch quite straightforward.
Two side notes:
(a) PyTorch includes some excellent tools for uploading and transforming images into Tensor
objects. For this assignment, you will not need to use these since you’ve already written
code for image input for HW 5 that gathers images into numpy arrays and it is trivial
to transform these objects into pytorch tensors.
(b) Because of the size of the images, you might find it quite computationally expensive
and tedious to use a fully-connected network on the full-sized images, especially if you
don’t have a usable GPU on your computer. Therefore, you are welcome to resize the
images before input to your first network. In fact, we strongly suggest that you start
with significantlly down-ssized images first and then see how far you can scale up!
2