## Description

1 Boolean Functions

In this problem, you will be asked to write Boolean functions and linear threshold functions

based on given labeled data.

1. [3 points] Table 1 shows several data points (the x’s) along with corresponding labels

(y). (That is, each row is an example with a label.) Write down three different Boolean

functions all of which can produce the label y when given the inputs x.

y x1 x2 x3 x4

0 1 0 0 0

0 1 1 0 0

1 1 0 1 1

Table 1: Original Table

2. [5 points] Next, we expand Table 1 to Table 2 by adding more data points. How many

errors will each of your functions from the previous questions make on the full data

set.

3. [7 points] Write down the linear threshold function for the data in Table 2.

1

y x1 x2 x3 x4

0 1 0 0 0

0 1 1 0 0

1 1 0 1 1

0 0 1 0 0

0 0 1 1 0

0 1 1 1 0

1 0 1 1 1

Table 2: Expanded Table

2 Mistake Bound Model of Learning

Consider an instance space consisting of integer points on the two dimensional plane

(x1, x2) with −128 ≤ x1, x2 ≤ 128. Let C be a concept class defined on this instance space.

Each function fr in C is defined by an integer radius r (with 1 ≤ r ≤ 128) as follows:

fr(x1, x2) =

+1 x

2

1 + x

2

2 ≤ r

2

;

−1 otherwise (1)

Our goal is to come up with a error-driven algorithm that will learn the correct function

f ∈ C that correctly classifies a dataset.

Side notes

1. Recall that a concept class is the set of functions from which the true target function

is drawn and the hypothesis space is the set of functions that the learning algorithm

searches over. In this question, both these are the same set.

2. Assume that there is no noise. That is, assume that the data is separable using the

hypothesis class.

Questions

1. [5 points] Determine |C|, the size of concept class.

2. [5 points] To design an error driven learning algorithm, we should be able to first write

down what it means to make a mistake. Suppose our current guess for the function is

fr defined as in Equation 1 above. Say we get an input point (x

t

1

, xt

2

) along with its

label y

t

. Write down an expression (an equality or an inequality) in terms of x

t

1

, x

t

2

, y

t

and r that checks whether the current hypothesis fr has made a mistake.

3. [10 points] Next, we need to specify how we will update a hypothesis if there is an

error. Since fr is completely defined in terms of r, we only need to update r. How

will you update r if there is an error? Consider errors for both positive and negative

examples.

2

4. [20 points] Use the answers from the previous two steps to write a mistake-driven

learning algorithm to learn the function. Please write the algorithm concisely in the

form of pseudocode. What is the maximum number of mistakes that this algorithm

can make on any dataset?

5. (For 6350 students)[15 points total] We have seen the Halving algorithm in class. The

Halving algorithm will maintain a set of hypotheses consistent with all the examples

seen so far and predict using the most frequent label among this set. Upon making a

mistake, the algorithm prune at least half of this set. In this question, you will design

and analyze a Halving algorithm for this particular concept space.

a. [5 points] The set of hypotheses consistent with all examples seen so far can be

defined storing only two integers. How would you do this?

b. [5 points] How would you check if there is an error for an example (x

t

1

, xt

2

) that

has the label y

t

?

c. [5 points] Write the full Halving algorithm for this specific concept space. (Do

not write the same Halving algorithm we saw in class. You need to tailor it to

this problem.) What is its mistake bound?

3 The Perceptron Algorithm and Its Variants

3.1 The Task and Data

Imagine you have access to information about people such as age, gender and level of education. Now, you want to predict whether a person makes over $50K a year or not using

these features.

We will use Adult data set from the UCI Machine Learning repository1

. The original

Adult data set has 14 features, among which 6 are continuous and 8 are categorical. In order

to make it easier to use, we will use a pre-processed version (and subset) of the original Adult

data set, created by the makers of the popular LIBSVM tool. From the LIBSVM website:

“In this data set, the continuous features are discretized into quantiles, and each quantile is

represented by a binary feature. Also, a categorical feature with m categories is converted

to m binary features.”

Use the training/test files called ‘a1a.train’ and ‘a1a.test’, available on the assignments page of the class website.2 This data is in the LIBSVM format, where each row is a

single training example. The format of the each row in the data is