## Description

1) Understanding of feedforward-designed convolutional neural networks (FF-CNNs) (15%)

An FF-CNN consists of two modules in cascade: 1) the construction of conv layers using the Saab

(Subspace approximation with adjusted bias) transforms: and 2) the construction of fully-connected (FC)

layers using the multi-stage linear least squared regressor (LSR).

• Summarize FF-CNNs with a flow chart and explain it in your own words.

• Explain the similarities and differences between FF-CNNs and backpropagation-designed CNNs

(BP-CNNs).

Do not copy any sentences from [1] or other papers directly. It is plagiarism. The scores will depend on

your degree of understanding.

2) Image reconstructions from Saab coefficients (35%)

Apply Saab transforms to images in the MNIST dataset [2].

• Compute the Saab coefficients (you can use online source codes [3] or implement by yourself) of

four handwritten digits images as shown in Figure 1 and implement the reconstruction algorithm

(write your own codes) to transform the Saab coefficients back to images.

• To evaluate the reconstruction results, you need to show the reconstructed images and compute

PSNR scores between original images and reconstructed images.

Architecture setting:

In this problem, you should use two stage Saab transforms where the spatial size of the transform

kernels is 4×4. The stride of each stage is 4 (non-overlapping). Thus, at the output, the dimension of

your Saab coefficients of an image should be 2x2xN, where N is the number of transform kernel in the

second stage. You need to evaluate on four different settings (different kernel numbers of each stage)

and discuss your results.

EE 569 Digital Image Processing: Homework #6

Professor C.-C. Jay Kuo Page 2 of 2

Figure 1

3) Handwritten digits recognition using ensembles of feedforward design (50%)

In this problem, you will apply an FF-CNN to solve handwritten digits recognition. Train an FF-CNN

using the 60,000 training images from the MNIST dataset. Adopt the LeNet-5-like architecture where

the filter numbers of the first- and the second-conv layers and the first- and the second-FC layers are 6,

16, 120 and 80, respectively. The spatial size of the transform kernels is 5×5 and the stride is 1 for each

conv layer. To reduce the spatial dimension, max-pooling layer is adopted.

• Report the training and testing classification accuracy for individual FF-CNN on the MNIST

dataset.

• One way to improve the performance is building the ensemble systems of FF-CNNs. Train ten

different FF-CNNs and ensemble their results following the method in [4]. Diversity is the key to

have successful ensembles, and paper [4] introduces three strategies to increase diversities in an

ensemble of FF-CNNs which you can refer to. Explain and justify your strategies to generate

various FF-CNNs in an ensemble and report the training and testing classification accuracy of

your ensemble system.

• Error analysis: Please compare classification error cases arising from BP-CNNs (use best result

in your HW#5) and FF-CNNs. What percentages of errors are the same? What percentages are

different? Please give explanations to your observations. Also, please propose ideas to improve

BP-CNNs, FF-CNNs or both and justify your proposal. There is no need to implement your

proposed ideas.

References

[1][ Kuo, C. C. J., Zhang, M., Li, S., Duan, J., & Chen, Y. (2019). Interpretable convolutional neural networks via

feedforward design. Journal of Visual Communication and Image Representation.

[2][MNIST] http://yann.lecun.com/exdb/mnist/

[3] https://github.com/davidsonic/Interpretable_CNN

[4] Chen, Y., Yang, Y., Wang, W., & Kuo, C. C. J. (2019). Ensembles of feedforward-designed convolutional

neural networks. arXiv preprint arXiv:1901.02154.