## Description

1. In discussion we derived an expression for the signed distance d between an arbitrary point

(or ) and a hyperplane H given by , all in nonaugmented feature

space. This question explores this topic further.

(a) Prove that the weight vector is normal to H.

Hint: For any two points on H, what is ? How can you

interpret the vector ?

(b) Show that the vector points to the positive side of H. (Positive side of H means the

side.)

Hint: What sign does the distance d from H to have, in which is a

point on H?

(c) Derive, or state and justify, an expression for the signed distance r between an

arbitrary point and a hyperplane in augmented feature

space. Set up the sign of your distance so that points to the positive-distance side

of H.

(d) In weight space, using augmented quantities, derive an expression for the signed

distance between an arbitrary point and a hyperplane , in

which the vector defines the positive side of the hyperplane.

2. For a 2-class learning problem with one feature, you are given four training data points (in

augmented space):

(a) Plot the data points in 2D feature space. Draw a linear decision boundary H that

correctly classifies them, showing which side is positive.

(b) Plot the reflected data points in 2D feature space. Draw the same decision boundary;

does it still classify them correctly?

(c) Plot the reflected data points, as lines in 2D weight space, showing the positive side of

each. Show the solution region.

(d) Also, plot the weight vector of H from part (a) as a point in weight space. Is in

the solution region?

3. (a) Let be a scalar function of a D-dimensional vector , and be a scalar

function of p. Prove that:

x p g( x) = w0 + w

T

x = 0

w

x1 and x2 g x1 ( ) − g x2 ( )

x1 − x2 ( )

w

d > 0

x = x1 ( + aw) x1

x

(+) g x

(+)

( ) = w

(+)T

x

(+)

= 0

w

w

(+) g x

(+)

( ) = w

(+)T

x

(+)

= 0

x

(+)

x1

(1) = (1,−3); x2

(1) = (1,−5); x3

(2) = (1,1); x4

(2) = (1,−1)

w w

p( x) x f ( p)

p. 2 of 2

i.e., prove that the chain rule applies in this way. [Hint: you can show it for the i

th

component of the gradient vector, for any i. It can be done in a couple lines.]

(b) Use relation (18) of DHS A.2.4 to find .

(c) Prove your result of in part (b) by, instead, writing out the components.

(d) Use (a) and (b) to find in terms of .

4. (a) Use relations above to find . Express your answer in terms of where

possible. Hint: let ; what is ?

(b) Find: . Express your result in simplest form. Hint: first choose p

(remember it must be a scalar).

5. [Extra credit] For , show that total linear separability implies linear separability, and

show that linear separability doesn’t necessarily imply total linear separability. For the

latter, a counterexample will suffice.

∇x f p( x) ⎡

⎣ ⎤

⎦ = d

dp f ( p) ⎡

⎣

⎢ ⎤

⎦

⎥∇x p( x)

∇x x

T ( x)

∇x x

T ( x)

∇x x

T

x ( )

3 ⎡

⎣

⎢ ⎤

⎦

⎥ x

∇w w

2

w

2

p = w

T

w f

∇w Mw− b

2

C > 2