## Description

In this homework assignment, you will use SVM classifiers with different kernels and

datasets. You will use a free SVM software package called LIBSVM provided at

http://www.csie.ntu.edu.tw/~cjlin/libsvm/. Please allow time for dealing with any bugs or

version issues that appear when you first try to use it.

Notes for Matlab users:

1.Download LIBSVM and read the documentation first to learn how to use it. Check

the README file inside the main folder libsvm-3.23 as well as the README file

inside the Matlab folder.

2. If you are using Windows, simply add the windows (not the Matlab) folder from

the LIBSVM folder to your path – no need to run the installation steps.

3.If you are using Unix based systems, navigate to the Matlab folder inside the

LIBSVM folder and run make. (I didn’t check this method since I work on

Windows computers)

4. You can ignore any Matlab warnings regarding the svmtrain function. This

warning concerns Matlab’s own svmtrain (built-in) and not the function provided

by LIBSVM.

Note for Python users:

1. Sklearn’s implementation of SVM is based on LIBSVM. Thus, there’s no need to

download LIBSVM.

We will use 3 different datasets as described below:

Problem

No.

Dataset

Name

# Features #

Classes

Linear Separable Has Test Set

1a-c HW8_1 2 2 Yes No

1d-e HW8_2 2 2 No No

2 wine 13 (first 2 features

used)

3 N/A Yes

1. In this problem, you will use the full dataset D for training, and where requested, the

classification accuracy is calculated on the (same) training dataset.

For parts (a)-(c) below, load HW8_1, which is a linearly separable case.

a) Use Linear Kernel and try different values of slack variable parameter C. What

is the meaning of parameter C and how will it impact your classification? Set C

p. 2 of 4

= 1, C = 100, report the accuracy of classification and plot the decision

boundary using plotSVMBoundaries() as provided. Explain your observation.

b) For HW8_1 and C = 100 with Linear Kernel, give the support vectors

numerically and circle them (or clearly identify them in some other way) in your

plot. Provide the weights w0, w1, w2 and the decision boundary equation.

Hint: for how to obtain the weights in MATLAB, see:

https://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html

FAQ: How could I generate the primal variable w of linear SVM?

Hint: python users can pass the support vectors as a parameter to

plotSVMBoundaries to get a new plot where they show.

c) Extra credit. Continuing from part (b) above, calculate �(�) for each support

vector. Are they on the boundary of the margin (corresponding to �(�) = ±1)?

If not, then conjecture as to why not, and test your conjecture computationally.

For parts (d)-(e) below, load HW8_2, which is not linearly separable.

d) Use Gaussian (RBF) Kernel with default gamma parameter. Set C = 50, 5000.

Report the accuracy of classification and plot the decision boundary using

plotSVMBoundaries() as provided. Explain the linearity or nonlinearity of the

decision boundary, and explain the difference in decision regions for the two

values of C.

e) Use Gaussian Kernel again, but now with default C parameter. Set gamma = 10,

50, 500. Report the accuracy of classification and plot the decision boundary

using plotSVMBoundaries() as provided. Explain the difference in decision

regions for the 3 values of gamma. Is there any overfitting problem?

2. In this problem, you will use cross-validation for model (parameter) selection, in

order to more optimally design an SVM classifier. There is also a separate test set

that you can use.

Note that cross-validation is covered in Lecture 18 (Monday 3/30/2020).

In this problem, you are required to code the cross-validation yourself; although you

may use MATLAB or Python functions to divide the dataset into subsets.

In all parts that ask for cross-validation, use 5-fold cross-validation on the “training

set” (this is the set we called in lecture). Do not use the test set until part (d).

Load wine, a 3 class dataset. In all of the following parts, use only the first two

features in training and testing.

Tip 1: The standard deviation can be estimated by:

D′

p. 3 of 4

.

or by using available Matlab or Python functions for unbiased sample

standard deviation.

Tip 2: For partitioning functions:

• Matlab: use cv = cvpartition(params) (to have a proper partitioning, be

sure stratified is set to true), and then idx = training(cv, k).

• Python: look for sklearn’s StratifiedKFold

(a) Use Gaussian (RBF) Kernel. Set , C = 1. Report the average crossvalidation accuracy.

(b) In this part you will use cross validation to find the best parameter set (model

selection).

First, pick some values from the ranges γ ∈ [10!”, 10″] and C ∈ [10!”, 10″]

(hint: use logspace(); at least 50 points for each parameter are recommended)

Next, initialize a 2D array of size (N#, �$) to store average accuracies on

the validation set; let’s call the array ACC. Also set up another 2D array,

DEV, to store the estimated standard deviation of accuracies on the validation

set. You will use these two array types, repeatedly, below.

Then, for each pair of [γ, C] perform a 5-fold cross validation and store the

average accuracy to the corresponding position in ACC, and similarly the

estimated standard deviation of accuracy in DEV. Perform the partitioning

randomly but stratified. Also, make sure that each run of the training

procedure is initialized (restarted with the new values of all variables) each

time.

(i) Visualize ACC. Turn in your visualization (pseudocolor map, plot, or

other) with your HW.

Tip for MATLAB: you can use imagesc(), mesh() or surf(), whichever

is most illustrative

Tip for Python: you can use imshow() and colorbar() from

matplotlib.pyplot

(ii) Is there a clear choice for best values of [γ, C] ? If so, report the pair of

chosen values, as well as its mean cross-validation accuracy and

standard deviation. If not, pick one set of values from the candidate

pairs (e.g., the pair with the lowest standard deviation), and give its

mean cross-validation accuracy and standard deviation.

(c) Repeat the cross validation procedure in (b) T=20 times (runs), storing the

resulting arrays ACC(t) and DEV(t) each time. No need to visualize every

run.

σˆ = 1

M −1 pm ( − p)

2

m=1

M

∑ ⎡

⎣

⎢

⎢

⎤

⎦

⎥

⎥

1

2

= 1

M −1 pm

2 − p

2 ( ) m=1

M

∑ ⎡

⎣

⎢

⎢

⎤

⎦

⎥

⎥

1

2

γ = 1

p

p. 4 of 4

(i) Report on the 20 chosen pairs of [γ, C].

(ii) Accumulate a single overall average ACC array, and a single overall

DEV array, based on all T=20 runs; that is, you will have one average

accuracy and standard deviation for each pair [γ, C]. Is a pick of the best

values for [γ, C] over the T=20 runs more well-defined or more

reproducible now? Justify your answer. Also, report on the final chosen

best values for [γ, C], as well as its mean cross-validation accuracy and

standard deviation.

(d) Use the full training set to train the final classifier using your best pair of

[γ, C] from (c) (ii) above. Then use the test set to estimate the accuracy of

your final classifier on unknowns. Is this estimate within approximately 1

standard deviation of your mean cross-validation accuracy from (c) (ii)? If

not, try to explain why.