Name: CS 5350/6350: Machine Learining Homework 2 solved
SKU: 3032
Price: 35.00 USD
Availability: InStock

Description

5/5 - (1 vote)

1 Boolean Functions
In this problem, you will be asked to write Boolean functions and linear threshold functions
based on given labeled data.
1. [3 points] Table 1 shows several data points (the x’s) along with corresponding labels
(y). (That is, each row is an example with a label.) Write down three different Boolean
functions all of which can produce the label y when given the inputs x.
y x1 x2 x3 x4
0 1 0 0 0
0 1 1 0 0
1 1 0 1 1
Table 1: Original Table
2. [5 points] Next, we expand Table 1 to Table 2 by adding more data points. How many
errors will each of your functions from the previous questions make on the full data
set.
3. [7 points] Write down the linear threshold function for the data in Table 2.
1
y x1 x2 x3 x4
0 1 0 0 0
0 1 1 0 0
1 1 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 1 1 0
1 0 1 1 1
Table 2: Expanded Table
2 Mistake Bound Model of Learning
Consider an instance space consisting of integer points on the two dimensional plane
(x1, x2) with −128 ≤ x1, x2 ≤ 128. Let C be a concept class defined on this instance space.
Each function fr in C is defined by an integer radius r (with 1 ≤ r ≤ 128) as follows:
fr(x1, x2) =
+1 x
2
1 + x
2
2 ≤ r
2
;
−1 otherwise (1)
Our goal is to come up with a error-driven algorithm that will learn the correct function
f ∈ C that correctly classifies a dataset.
Side notes
1. Recall that a concept class is the set of functions from which the true target function
is drawn and the hypothesis space is the set of functions that the learning algorithm
searches over. In this question, both these are the same set.
2. Assume that there is no noise. That is, assume that the data is separable using the
hypothesis class.
Questions
1. [5 points] Determine |C|, the size of concept class.
2. [5 points] To design an error driven learning algorithm, we should be able to first write
down what it means to make a mistake. Suppose our current guess for the function is
fr defined as in Equation 1 above. Say we get an input point (x
t
1
, xt
2
) along with its
label y
t
. Write down an expression (an equality or an inequality) in terms of x
t
1
, x
t
2
, y
t
and r that checks whether the current hypothesis fr has made a mistake.
3. [10 points] Next, we need to specify how we will update a hypothesis if there is an
error. Since fr is completely defined in terms of r, we only need to update r. How
will you update r if there is an error? Consider errors for both positive and negative
examples.
2
4. [20 points] Use the answers from the previous two steps to write a mistake-driven
learning algorithm to learn the function. Please write the algorithm concisely in the
form of pseudocode. What is the maximum number of mistakes that this algorithm
can make on any dataset?
5. (For 6350 students)[15 points total] We have seen the Halving algorithm in class. The
Halving algorithm will maintain a set of hypotheses consistent with all the examples
seen so far and predict using the most frequent label among this set. Upon making a
mistake, the algorithm prune at least half of this set. In this question, you will design
and analyze a Halving algorithm for this particular concept space.
a. [5 points] The set of hypotheses consistent with all examples seen so far can be
defined storing only two integers. How would you do this?
b. [5 points] How would you check if there is an error for an example (x
t
1
, xt
2
) that
has the label y
t
?
c. [5 points] Write the full Halving algorithm for this specific concept space. (Do
not write the same Halving algorithm we saw in class. You need to tailor it to
this problem.) What is its mistake bound?
3 The Perceptron Algorithm and Its Variants
3.1 The Task and Data
Imagine you have access to information about people such as age, gender and level of education. Now, you want to predict whether a person makes over $50K a year or not using
these features.
We will use Adult data set from the UCI Machine Learning repository1
. The original
Adult data set has 14 features, among which 6 are continuous and 8 are categorical. In order
to make it easier to use, we will use a pre-processed version (and subset) of the original Adult
data set, created by the makers of the popular LIBSVM tool. From the LIBSVM website:
“In this data set, the continuous features are discretized into quantiles, and each quantile is
represented by a binary feature. Also, a categorical feature with m categories is converted
to m binary features.”
Use the training/test files called ‘a1a.train’ and ‘a1a.test’, available on the assignments page of the class website.2 This data is in the LIBSVM format, where each row is a
single training example. The format of the each row in the data is
: : …
Here denotes the label for that example. The rest of the elements of the row
is a sparse vector denoting the feature vector. For example, if the original feature vector is
[0, 0, 1, 2, 0, 3], this would be represented as 3:1 4:2 6:3. That is, only the non-zero entries
of the feature vector are stored.
1Look for information about the Adult data set at https://archive.ics.uci.edu/ml/datasets/Adult
2These are the same as a1a and a1a.t available at http://www.csie.ntu.edu.tw/~cjlin/
libsvmtools/datasets/binary.html
3
3.2 Algorithms
You will implement two variants of the Perceptron algorithm. Note that each variant has
different hyper-parameters, as described below.
• Perceptron: This is the simple version of Perceptron as described in the class. An
update will be performed on an example (x, y) if y(wT x + b) ≤ 0.
Hyper-parameters: The learning rate r
Two things bear additional explanation.
First, note that in the formulation above, the bias term b is explicitly mentioned. This
is because the features in the Adult data do not include a bias feature. Of course, you
could choose to add an additional constant feature to each example and not have the
explicit extra b during learning. (See the class lectures for more information.) However,
here, we will see the version of Perceptron that explicitly has the bias term.
Second, if w and b are initialized with zero, then the learning rate will have no effect.
To see this, recall the Perceptron update:
wnew ← wold + ryx
bnew ← bold + ry.
Now, if w and b are initialized with zeroes and a learning rate r is used, then we can
show that the final parameters will be equivalent to having a learning rate 1. The final
weight vector and the bias term will be scaled by r compared to the unit learning rate
case.
For this assignment, you should initialize the weight vector and the bias randomly and
tune the learning rate parameter. We recommend trying small values less than one.
(eg. 1, 0.1, 0.01, etc.)
• Margin Perceptron: This variant of Perceptron will perform an update on an example (x, y) if y(wT x + b) ≤ µ, where µ is an additional positive hyper-parameter,
specified by the user. Note that because µ is positive, this algorithm can update the
weight vector even when the current weight vector does not make a mistake on the
current example.
Hyper-parameters: Learning rate r and the margin µ.
We recommend setting the value of µ between 0 and 5.0.
As mentioned in previous homework, you may use any programming language for your implementation.
However, the graders should be able to execute your code on the CADE machines.
3.3 Experiments
1. [Sanity check, 10 points] Run the simple Perceptron algorithm on the data in Table 2
(one pass only) and report the weight vector that the algorithm returns. How many
mistakes does it make?
4
You may choose whatever learning rate you like, but we suggest that you informally
experiment with them before submitting the results.
2. [Online setting, 15 points] Run both the Perceptron algorithm and the margin Perceptron on the Adult data for one pass.
Report the number of updates (or equivalently mistakes) made by each algorithm and
the accuracy of the final weight vector on both the training and the test set.
Once again, you will require some playing with the algorithm hyper-parameters. You
will see that the hyper-parameters will make a difference and so try out different
values. You may even write some code to run the algorithms with different sets of
hyper-parameters.
3. [Using online algorithms in the batch setting, 20 points] The third experiment is to evaluate the algorithms in a more realistic setting, where the algorithms perform multiple
passes over the training data. This means that there is an additional hyper-parameter:
the number of epochs.
Run the algorithms for three and five epochs and report the number of updates made,
and the accuracies of the final weight vectors on the training and test data.
It may be important to shuffle the training data before starting each epoch. Report the
results of the above experiments when you shuffle do so. Briefly explain your results.
4. (For 6350 Students) [Aggressive Perceptron with Margin, 10 points] Implement is
an extension of the margin Perceptron which performs an aggressive update as follows:
If y(wT x + b) ≤ µ, then update
(a) wnew ← wold + ηyx
(b) bnew ← b + ηy,
Unlike the standard Perceptron algorithm, here the learning rate η is given by
η =
µ − y(wT x + b)
xT x + 1
As with the margin perceptron, there is an additional positive parameter µ.
We call this the aggressive update because the update can be derived from the following
optimization problem. When we see that y(wT x + b) ≤ µ, we try to find new values
of w and b such that y(wT x + b) = µ using
min
wnew
1
2
||wnew − wold||2 +
1
2
(bnew − bold)
2
s.t. y(wT x + b) = µ.
That is, the goal is to find the smallest change in the weights so that the current
example is on the right side of the weight vector. By substituting (a) and (b) from
5
above into this optimization problem, we will get a single variable optimization problem
whose solution gives us the η defined above. You can think of this algorithm as trying
to tune the weight vector so that the current example is correctly classified right after
the update.
Repeat the batch experiments with the aggressive update. You should report two sets
of results (one with shuffling and one without).
What To Submit
1. The report should detail your experiments. For each step, explain in no more than
a paragraph or so how your implementation works. You may provide the results for
the final step as a table or a graph. Describe what you did. Comment on the design
choices in your implementation. For your experiments, what algorithm parameters did
you use? Try to analyze this and give your observations.
2. Your report should be in the form of a pdf file, LATEX is recommended.
3. Your code should run on the CADE machines. You should include a shell script,
run.sh, that will execute your code in the CADE environment. Your code should
produce similar output to what you include in your report.
You are responsible for ensuring that the grader can execute the code using only the
included script. If you are using an esoteric programming language, you should make
sure that its runtime is available on CADE.
4. Put your project code in a single directory, and the best is to create a compressed
tar/zip file of code and script used to run it. Please do not hand in binary files.
5. Please look up the late policy on the course website.
6

CS 5350/6350: Machine Learining Homework 2 solved

Description

Related products

CS 5350/6350: Machine Learining Homework 3 solved

CS 5350/6350: Machine Learining Homework 4 solved

CS 5350/6350: Machine Learining Homework 5 solved