Description

5/5 - (1 vote)

1 Boosting

In this problem, you will develop an alternative way of forward stagewise boosting. The overall
goal is to derive an algorithm for choosing the best weak learner ht at each step such that it best
approximates the gradient of the loss function with respect to the current prediction of labels.

In
particular, consider a binary classification task of predicting labels yi ∈ {+1, −1} for instances
xi ∈ R
d
, for i = 1, . . . , n. And also a set of weak learners denoted by H = {hj , j = 1, . . . , M}
that we have access to. In this framework, we first choose a loss function L(yi
, yˆi) in terms of
current prediction ˆyi and the true labels yi

. Here, we use the logistic loss, namely L(yi
, yˆi) =
log(1 + exp(−yiyˆi)). We also consider the gradient gi of the cost function L(yi
, yˆi) with respect
to the current predictions ˆyi on each instance, i.e. gi =
∂L(yi,yˆi)
∂yˆi
.

We take the following steps for
boosting:
(a) Gradient Calculation In this step, please calculate the gradients gi =
∂L(yi,yˆi)
∂yˆi
.

(b) Weak Learner Selection We then choose the next learner to be the one that can best
predict these gradients, i.e. we choose
h
∗ = arg minh∈H
min
γ∈R
Xn
i=1
(−gi − γh(xi))2
!

Please show that the optimal value of the step size γ can be computed in the closed form in
this step, thus the selection rule for h
∗
can be derived independent of γ.

(c) Step Size Selection We then select the step size α
∗
that minimizes the loss:
α
∗ = arg minα∈R
Xn
i=1
L(yi
, yˆi + αh∗
(xi)).

For the logistic loss function, there is no closed form solution for α
∗
. Instead, we use one
step of Newton’s method to obtain the step size. That is we start from α = 0 and carry out
one step of Newton’s method to obtain the step size α
∗

. Finally, we perform the following
updating step:
yˆi ← yˆi + α
∗h
∗
(xi).

You are asked to compute the expression of step size analytically in terms of yi
, ˆyi
, and h
∗
.

2 SVM

This problem focuses on -Support Vector Regression (-SVR). Suppose we are given training data
{(x1, y1), . . . ,(xl
, yl)} ∈ X ×R where X denotes the space of the input patterns. In -SV regression,
our goal is to find a functions f(x) that has at most deviation from the actually obtained targets
yi for all the training data, and at the same time, is as flat as possible.

In other words, we do not
care about error as long as they are less than but will not accept any deviation larger than this.
This may be important if you want to be sure not to lose more than money when dealing with
exchange rates.

(a) Let’s start with the linear case where:
f(x) = w
T x + b with w ∈ X , b ∈ R
Flatness in this case means that one seeks small ||w||2
2
. write down the primal optimization
formulation such that it maximized the flatness given the most deviation for each point.

(b) However there is not always a solution to the optimization that is derived in (a). That is due
to the fact that it may not be possible to find a line that guarantees deviation. In this case
we need to bring in a (hinge) loss function to penalize the case when the deviation is greater
than .

Assume that the loss function punishes linearly when you have deviation above
(hinge). Modify the above optimization formulation to incorporate the loss function with
weight factor C > 0. C determines the trade off between the flatness of f and the amount
up to which deviations larger than are tolerated. Plot the hinge loss function you used in
this question.

(c) The derived optimization problem is easier to be solve in its dual form and also easier to
incorporate nonlinearity into it. Construct the Lagrange function of previous primal that you
derived, and then derive the dual form. Show that the weight w only depends on the support
vectors.

(d) In the next step we make the SV regression nonlinear. This, for instance, could be achieved
by simply preprocessing the training patterns xi by a map φ : X → F into some feature
space F. Incorporate this nonlinear mapping into the derived dual form in terms of kernel
k(x, x0
) := φ
T(x)φ(x
0
) and show that how prediction is derived using the kernel function.

3 Programming

In this part, you will experiment with SVMs on a real-world dataset. You will implement linear
SVM (i.e., SVM using the original features, namely the nonlinear mapping is the identity mapping)
using Matlab. Also, you will use a widely used SVM toolbox called LIBSVM to experiment with
kernel SVMs.

Dataset: We have provided the Phishing Websites Data Set from UCI’s machine learning data
repository (https://archive.ics.uci.edu/ml/datasets/Phishing+Websites. The dataset is for a binary classification problem to detect phishing websites. The dimensionality of the input feature is 30, and
the training and test sets contain 2,000 and 2,000 samples, respectively (they are provided in Blackboard as
phishing-train.mat and phishing-test.mat which can be directly loaded into Matlab).

3.1 Data preprocessing
All the features in the datasets are categorical. You need to preprocess the training and test data to make
features with multiple values to features taking values only zero or one. If a feature fi have value {−1, 0, 1},
we create three new features fi,−1, fi,0 and fi,1. Only one of them can have value 1 and fi,x = 1 if and only
if fi = x. For example, we transform the original feature 1 into [0, 0, 1]. In the given dataset, the features
2, 7, 8, 14, 15, 16, 26, 29 (index starting from 1) take three different values {−1, 0, 1}. You need to transform
each above feature into three 0/1 features.

3.2 Implement linear SVM
Please write Matlab functions trainsvm as trainsvm.m and testsvm as testsvm.m. The input of trainsvm
contain training feature vectors and labels, as well as the tradeoff parameter C. The output of trainsvm
contain the SVM parameters (weight vector and bias). In your implementation, you need to solve SVM in
its primal form
min
w,b,ξ
1
2
kwk
2
2 + C
X
i
ξi
s.t. yi(w>xi + b) ≥ 1 − ξi
, ∀i
ξi ≥ 0, ∀i
Please use quadprog function in Matlab to solve the above quadratic problem. For testsvm, the input
contain testing feature vectors and labels, as well as SVM parameters; the output contains the test accuracy.

3.3 Cross validation for linear SVM
Use 3-fold cross validation to select the optimal C for your implementation of linear SVM.
(a) Report the 3-fold cross-valiation accuracy (averaged accuracy over each validation set) and average
training time (averaged over each training subset) on different C taken from {4
−6
, 4
−5
, · · · , 4, 4
2}.
How does the value of C affect the cross validation accuracy and average training time? Explain your
observation.

(b) Which C do you choose based on the cross validation results?

(c) You need to write a script own linear.m using your trainsvm, testsvm and the optimal C selected
from cross validation. Your script should train a linear SVM with selected C and output the accuracy
on the test set.

3.4 Use linear SVM in LIBSVM

LIBSVM is widely used toolbox for SVM and has Matlab interface. Download LIBSVM from https:
//www.csie.ntu.edu.tw/~cjlin/libsvm/ and install it according to the README file (make sure to use
Matlab interface provided in LIBSVM toolbox). For each C from {4
−6
, 4
−5
, · · · , 4, 4
2}, apply 3-fold cross
validation (use -v option in LIBSVM) and report the cross validation accuracy and average training time.

(a) Is the cross validation accuracy the same as that in 3.3? Note that LIBSVM solves linear SVM in
dual form while your implementation does it in primal form.

(b) How faster is LIBSVM compared to your implementation, in terms of training?

3.5 Use kernel SVM in LIBSVM
LIBSVM supports a number of kernel types. Here you need to experiment with the polynomial kernel and
RBF (Radial Basis Function) kernel.

(a) Polynomial kernel. Please tune C and degree in the kernel. For each combination of (C, degree),
where C ∈ {4
−3
, 4
−2
, · · · , 4
6
, 4
7} and degree ∈ {1, 2, 3}, report the 3-fold cross validation accuracy
and average training time.

(b) RBF kernel. Please tune C and gamma in the kernel. For each combination of (C, gamma), where
C ∈ {4
−3
, 4
−2
, · · · , 4
6
, 4
7} and gamma ∈ {4
−7
, 4
−6
, · · · , 4
−2
, 4
−1}, report the 3-fold cross validation
accuracy and average training time.

Based on the cross validation results of Polynomial and RBF kernel, which kernel type and kernel parameters
will you choose? You need to write a script libsvm.m using the best kernel and the parameters selected
from cross-validation. Your script should train a SVM using libsvm with selected kernel and parameters and
output the accuracy on the test set.

4 Submission Instructions
Submission Instructions: You need to provide the followings:
• Provide your answers to problems 1-3 in PDF file, named as CSCI567 hw4 fall15.pdf. You
need to submit the homework in both hard copy (at Locker #19 at PHE, with a box labeled as
CSCI567-homework by 6pm of the deadline date) and electronic version as pdf file on Blackboard.
If you choose handwriting instead of typing all the answers, you will get 40% points deducted.

• Submit ALL the code and report via Blackboard by 6 pm of the deadline date. The only
acceptable language is MATLAB. For your program, you MUST include the main function called
CSCI567 hw4 fall15.m in the root of your folder. After running this main file, your program
should be able to generate all of the results needed for this programming assignment, either as
plots or console outputs.

Moreover, you SHOULD include scripts own linear.m and libsvm.m
in the root of your fold. After running these two scrips, your program should be able to generate
the results described above. You can have multiple files (i.e your sub-functions), however, the
requirement is that once we unzip your folder and execute your main file or the two scripts, your
program should execute correctly. Please double-check your program before submitting. You
should only submit one .zip file. No other formats are allowed except .zip file. Also, please
name it as [lastname] [firstname] hw4 fall15.zip.

CSCI567 Homework 4 Boosting solution

Description

1 Boosting

2 SVM

3 Programming

Related products

CSCI567 Homework #5 solved

CSCI567 Homework 2 Linear Regression solved

CSCI 567 Homework #2 solved