Description
1. In discussion we derived an expression for the signed distance d between an arbitrary point
(or ) and a hyperplane H given by , all in nonaugmented feature
space. This question explores this topic further.
(a) Prove that the weight vector is normal to H.
Hint: For any two points on H, what is ? How can you
interpret the vector ?
(b) Show that the vector points to the positive side of H. (Positive side of H means the
side.)
Hint: What sign does the distance d from H to have, in which is a
point on H?
(c) Derive, or state and justify, an expression for the signed distance r between an
arbitrary point and a hyperplane in augmented feature
space. Set up the sign of your distance so that points to the positive-distance side
of H.
(d) In weight space, using augmented quantities, derive an expression for the signed
distance between an arbitrary point and a hyperplane , in
which the vector defines the positive side of the hyperplane.
2. For a 2-class learning problem with one feature, you are given four training data points (in
augmented space):
(a) Plot the data points in 2D feature space. Draw a linear decision boundary H that
correctly classifies them, showing which side is positive.
(b) Plot the reflected data points in 2D feature space. Draw the same decision boundary;
does it still classify them correctly?
(c) Plot the reflected data points, as lines in 2D weight space, showing the positive side of
each. Show the solution region.
(d) Also, plot the weight vector of H from part (a) as a point in weight space. Is in
the solution region?
3. (a) Let be a scalar function of a D-dimensional vector , and be a scalar
function of p. Prove that:
x p g( x) = w0 + w
T
x = 0
w
x1 and x2 g x1 ( ) − g x2 ( )
x1 − x2 ( )
w
d > 0
x = x1 ( + aw) x1
x
(+) g x
(+)
( ) = w
(+)T
x
(+)
= 0
w
w
(+) g x
(+)
( ) = w
(+)T
x
(+)
= 0
x
(+)
x1
(1) = (1,−3); x2
(1) = (1,−5); x3
(2) = (1,1); x4
(2) = (1,−1)
w w
p( x) x f ( p)
p. 2 of 2
i.e., prove that the chain rule applies in this way. [Hint: you can show it for the i
th
component of the gradient vector, for any i. It can be done in a couple lines.]
(b) Use relation (18) of DHS A.2.4 to find .
(c) Prove your result of in part (b) by, instead, writing out the components.
(d) Use (a) and (b) to find in terms of .
4. (a) Use relations above to find . Express your answer in terms of where
possible. Hint: let ; what is ?
(b) Find: . Express your result in simplest form. Hint: first choose p
(remember it must be a scalar).
5. [Extra credit] For , show that total linear separability implies linear separability, and
show that linear separability doesn’t necessarily imply total linear separability. For the
latter, a counterexample will suffice.
∇x f p( x) ⎡
⎣ ⎤
⎦ = d
dp f ( p) ⎡
⎣
⎢ ⎤
⎦
⎥∇x p( x)
∇x x
T ( x)
∇x x
T ( x)
∇x x
T
x ( )
3 ⎡
⎣
⎢ ⎤
⎦
⎥ x
∇w w
2
w
2
p = w
T
w f
∇w Mw− b
2
C > 2