Sale!

CS 5350/6350: Machine Learining Homework 1 solved

$35.00 $21.00

Category:

Description

5/5 - (1 vote)

1 Decision Trees
Several questions below ask you to write down decision trees. Report decision trees as a series of if-then
statements. For example, you could write a tree as follows:
if feature 0 = x:
if feature 1 = y:
if feature 2 = z:
class = +
if feature 2 != z:
class = –
if feature 1 != y:
class = +
if feature 0 != x:
if feature 1 = r:
class = +
if feature 1 != r:
class = –
1. [Boolean Functions, 9 points] Recall from class that Boolean functions can be represented as decision trees. Represent the following boolean functions as decision trees:
1
(a) (A ∨ B) ∧ C [3 points]
(b) A ⊕ B [3 points]
(c) A ∧ ¬B ∧ ¬C ∧ D [3 points]
2. [Inducing Decision Trees, 21 points] For this question, you will use the Balloons dataset
from the UCI Machine Learning repository1
. The data consists of four attributes
(Color, Size, Action and Age) and a binary label.
The entire dataset is shown in Table 1 below.
Label Color Size Action Age
T YELLOW SMALL STRETCH ADULT
T YELLOW SMALL STRETCH CHILD
T YELLOW SMALL DIP ADULT
F YELLOW SMALL DIP CHILD
F YELLOW SMALL DIP CHILD
T YELLOW LARGE STRETCH ADULT
T YELLOW LARGE STRETCH CHILD
T YELLOW LARGE DIP ADULT
F YELLOW LARGE DIP CHILD
F YELLOW LARGE DIP CHILD
T PURPLE SMALL STRETCH ADULT
T PURPLE SMALL STRETCH CHILD
T PURPLE SMALL DIP ADULT
F PURPLE SMALL DIP CHILD
F PURPLE SMALL DIP CHILD
T PURPLE LARGE STRETCH ADULT
T PURPLE LARGE STRETCH CHILD
T PURPLE LARGE DIP ADULT
F PURPLE LARGE DIP CHILD
F PURPLE LARGE DIP CHILD
Table 1: Balloon Data Set
(a) Find the entropy of the balloon dataset. [2 points]
(b) Find the information gain of the Action feature. [2 points]
(c) Use the ID3 heuristic we have seen in the class to manually construct a decision
tree that correctly classifies the data. [7 points]
(d) We will now develop another heuristic for learning decision trees instead of ID3.
If, at some node, we stopped growing the tree and assign the majority label of
the remaining examples at that node, then the empirical error on the training set
at that node will be
M ajorityError = 1 − max(p, 1 − p)
1You can learn more about the UCI repository at http://archive.ics.uci.edu/ml/. Look for information about the balloons data set at http://archive.ics.uci.edu/ml/datasets/Balloons.
2
where, p is the fraction of examples that are labeled T and, hence, 1 − p is the
fraction labeled F. Notice that M ajorityError can be thought of as a measure
of impurity just like entropy.
Redefine information gain using M ajorityError as the measure of impurity.
What is the redefined information gain of the Action feature? [3 points]
(e) Using the M ajorityError-based impurity measure, construct a decision tree for
the balloons data. [7 points]
3. (For CS 6350 students, 15 points) Use the first twelve examples to create decision
trees using both the heuristics. How well do the trees perform on the remaining
examples? Report the error rates (i.e the ratio of the errors to the number of examples)
of the two algorithms.
2 Nearest Neighbors
The nearest neighbors algorithm partitions the space of examples into regions corresponding
to different labels. In two dimensions, the decision boundaries can be represented as a
Voronoi diagram, which shows regions of the plane associated with each label.
For this part, you will be drawing the decision boundaries for simple datasets.
1. [5 points] Using the Euclidean distance measure between points, show a Voronoi map
corresponding to the nearest neighbor classification of the following four points. (That
is, draw a diagram that shows how the nearest neighbor classification of the following
four points partitions the two dimensional plane.)
Label x y
A 1 1
A 1 -1
B -1 -1
C 2 -2
2. [5 points] Using the city-block distance measure, show a Voronoi map corresponding
to the nearest neighbor classification of the following three points.
(Recall that the city-block measure/Manhattan distance/L1 distance/taxicab metric
between two points x, y in the n dimensional space < n is defined as Pn i=1 |xi − yi |.) Label x y A 1 1 B -1 -1 C 2 -2 3 3. [5 points] In the simple 1-nearest neighbor algorithm, the label for a point is the same as the label of its nearest neighbor in the training set. We will define a variant of this method, which we will call the weighted nearest neighbor algorithm. If there are M training points, all of them contribute to the score of final label. Formally, say the training points are represented by xi with their corresponding label yi . A test point is x, then a label, say A, is associated with a score that is a function W of the distance of x from all the training points labeled as A. Score(A) = X M i=1;yi=A 1 d(xi , x) . Here, d denotes the distance between points. For the purpose of this question, we will assume that d is the Euclidean distance. The final classification is the label with the highest score. Using the weighted nearest neighbor algorithm with all four points from question 1 above, what would the classification of a test point at (4, 0) be? Show how you get the label. 3 The Badges Game (again) In this question, you will implementing decision tree learners and the K-nearest neighbors algorithms. This problem uses a variant on the badges data shown in the first lecture. You can find the data on the assignments page of the class website in a file called badges.tar.gz. It consists of two files – badges-train.txt and badges-test.txt, which you will use for training and testing respectively. Each file consists of a label (+/-) followed by a name. All the names in the files have been lower cased and new labels were generated using a new hidden function that is a composition of mathematical functions and categorical features extracted from the names. Your goal is to use various learning algorithms on the training data to train a predictor and see how well it does on the test data. You may use any programming language for your implementation. However, the graders should be able to execute your code on the CADE machines. 1. An important step in creating machine learning applications is to device useful features. For example, a feature function that selects the second letter would have been a great feature to use on the old badges data. eg: If this function was called φ, then φ(“George Washington”) =‘e’. (And is unfortunately not so great for this data!) Devise four feature functions and use them to extract features from the new badges data. Describe each feature function in no more than one line. At the end, you should have transformed each name in the training and test set into a set of four feature values. 4 2. Implement a decision tree data structure. (Remember that the decision tree need not be a binary tree!) 3. Implement the ID3 learning algorithm for your decision tree implementation. For debugging your implementation, you can use the previous toy examples from the homework like the balloons data from Table 1. 4. (For 6350 students) Implement the M ajorityError based learning algorithm that you saw in the first question. 5. Implement a K Nearest Neighbor classifier for K ∈ {1 . . . 5}. Note that your features are only categorical. So you have to make choices about how to measure distances between them. For example, you could consider using the Hamming distance between the feature representations. 6. For each algorithm you implemented show the accuracy on the testing data. Rank your algorithms in decreasing order of performance. What to hand in for this problem: 1. The report should detail your experiments. For each step, explain in no more than a paragraph or so how your implementation works. You may provide the results for the final step as a table or a graph. 2. Your code should run on the CADE machines. You should include a shell script, run.sh, that will execute your code in the CADE environment. Your code should produce similar output to what you include in your report. You are responsible for ensuring that the grader can execute the code using only the included script. If you are using an esoteric programming language, you should make sure that its runtime is available on CADE. 3. Please do not hand in binary files! Grading: • Features: 8 points • Decision tree implementation: 2 points • ID3: 10 points • (For 6350 students) M ajorityError: 10 points • K-nearest neighbors: 15 points • Evaluation and report: 20 points Not graded: Take a guess at how the labels were generated. 5