# CS 5350/6350: Machine Learining Homework 3 solved

\$35.00

## Description

5/5 - (1 vote)

1 Warm up: Feature expansion
[10 points] Recall problem 2 in Homework 2, where we saw the concept class C consisting of
functions function fr defined by an integer radius r (with 1 ≤ r ≤ 128) as follows:
fr(x1, x2) = 
+1 x
2
1 + x
2
2 ≤ r
2
;
−1 otherwise (1)
Clearly the hypothesis class is not linearly separable in < 2 . Construct a function φ(x1, x2) that maps examples to a new space, such that the positive and negative examples are linearly separable in that space. (Note that φ should not depend on r). 2 PAC Learning 1. [20 points total] A factory assembles a product that consist of different parts. Suppose a robot was invented to recognize whether a product contains all the right parts. The rules of making products are very simple: 1) you are free to combine any of the parts 1 as they are 2) you may also cut any of the parts into two distinct pieces before using them. You wonder how much effort a robot would need to figure out the what parts are used in the product. (a) [5 points] Suppose that a naive robot has to recognize products made using only rule 1. Given N available parts and each product made out of these constitutes a distinct hypothesis. How large would the hypothesis space be? Brief explain your answer. (b) [5 points] Suppose that an experienced worker follows both rules when making a product. How large is the hypothesis space now? Explain. (c) [10 points] An experienced worker decides to train the naive robot to discern the makeup of a product by showing you the product samples he has assembled. There are 6 available parts. If the robot would like to learn any product at 0.01 error with probability 99%, how many examples would the robot have to see? 2. [20 points, from Tom Mitchell’s book] We have learned an expression for the number of training examples sufficient to ensure that every hypothesis will have true error no worse than  plus its observed training error errorD(h). In particular, we used Hoeffding bounds to derive m ≥ 1 2 2 (ln(|H|) + ln(1/δ)). Derive an alternative expression for the number of training examples sufficient to ensure that every hypothesis will have true error no worse than (1 + γ)errorD(h). You can use general Chernoff bounds to derive such a result. Chernoff bounds: Suppose X1, · · · , Xm are the outcomes of m independent coin flips (Bernoulli trials), where the probability of heads on any single trail is P r[Xi = 1] = p and the probability of tails is P r[Xi = 0] = 1 − p. Define S = X1 + X2 + · · · + Xm to be the sum of these m trials. The expected value of S/m is E[S/m] = p. The Chernoff bounds govern the probabilty that S/m will differ from p by some factor 0 ≤ γ ≤ 1. P r[S/m > (1 + γ)p] ≤ e
−mpγ2/3
P r[S/m < (1 − γ)p] ≤ e −mpγ2/2 (2) 3 VC Dimension In this problem, we investigate a few properties of the Vapnik-Chervonenkis dimension. 1. [Shattering, 10 points] Recall the definition of shattering from class: a concept class shatters a set of points if for any labeling of those points, there is some function in the class that correctly labels it. Let C be the set of all conjunctions of n Boolean variables. Find a set S ⊆ {0, 1} n consisting of exactly n examples that can be shattered by C. Prove the correctness of your answer. 2 2. [10 points] Show that a finite concept class C has VC dimension at most log |C|. Hint: You can prove this by contradiction. 3. [15 points] We have a learning problem where each example is a point in < 2 . The concept class H is defined as follows: A function h ∈ H is specified by two parameters a and b. An example x = {x1, x2} in < 2 is labeled as + if and only if x1 ≥ a and x2 ≤ b and is labeled − otherwise. For example, if we set a = 1, b = 4, the grey region in figure 1 is the region of x = {x1, x2} that has label +1. What is the VC dimension of this class? x1 x2 x1 = 1 x2 = 4 Figure 1: An example with a = 1, b = 4. All points in the gray region (extending infinitely) shows the region that will be labeled as positive. 4. [15 points] Determine the VC dimension of concept class consisting of the union of 2 intervals on the real line (that is, points within either of the intervals will be labeled as positive). 5. [For 6350 Students, 10 points] Generalize the result of the above question. Find the VC dimension of the union of k intervals on the real line. 6. [For 6350 Students, 15 points] Let two hypothesis classes H1 and H2 satisfy H1 ⊆ H2. Prove: V C(H1) ≤ V C(H2). What To Submit 1. Your assignment should be no more than 10 pages. X points will be deducted if your submission is X pages beyond the page limit. 3 2. Your report should be in the form of a pdf file, LATEX is recommended. 3. Please look up the late policy on the course website. 4