## Description

Exercise 1: Bayes’ rule (6 points). Suppose that 5% of competitive athletes use performanceenhancing drugs and that a particular drug test has a 2% false positive rate and a 1.5% false negative

rate.

1. (3 points) Athlete A tests positive for drug use. What is the probability that Athlete A is using

drugs?

2. (3 points) Athlete B tests negative for drug use. What is the probability that Athlete B is not

using drugs?

Exercise 2: Bayesian decision theory: losses and risks (11 points). Consider a classification

problem with K classes, using a loss λik ≥ 0 if we choose class i when the input actually belongs to

class k, for i, k ∈ {1, . . . , K}.

1. (2 points) Write the expression for the expected risk Ri(x) for choosing class i as the class for a

pattern x, and the rule for choosing the class for x.

Consider a two-class problem with losses given by the matrix λik =

0 1

λ21 0

.

2. (3 points) Give the optimal decision rule in the form “p(C1|x) > . . . ” as a function of λ21.

3. (3 points) Imagine we consider both misclassification errors as equally costly. When is class 1

chosen (for what values of p(C1|x))?

4. (3 points) Imagine we want to be very conservative when choosing class 2 and we seek a rule of

the form “p(C2|x) > 0.99” (i.e., choose class 2 when its posterior probability exceeds 99%). What

should λ21 be?

Exercise 3: association rules (6 points). Given the following data of transactions at a supermarket, calculate the support and confidence values of the following association rules: meat → avocado,

avocado → meat, yogurt → avocado, avocado → yogurt, meat → yogurt, yogurt → meat. What is the

best rule to use in practice?

transaction # items in basket

1 meat, avocado

2 yogurt, avocado

3 meat

4 yogurt, meat

5 avocado, meat, yogurt

6 meat, avocad

Exercise 4: true- and false-positive rates (10 points). Consider the following table, where xn is

a pattern, yn its ground-truth label (1 = positive class, 2 = negative class) and p(C1|xn) the posterior

probability produced by some probabilistic classification algorithm:

n 1 2 3 4 5

yn 1 2 2 1 2

p(C1|xn) 0.6 0.7 0.5 0.9 0.2

We use a classification rule of the form “p(C1|x) > θ” where θ ∈ [0, 1] is a threshold.

1. (8 points) Give, for all possible values of θ ∈ [0, 1], the predicted labels and the corresponding

confusion matrix and classification error.

2. (2 points) Plot the corresponding pairs (fp,tp) as an ROC curve.

Exercise 5: ROC curves (8 points). Imagine we have a classifier A that has false-positive and

true-positive rates fpA,tpA ∈ [0, 1] such that fpA > tpA (that is, this classifier is below the diagonal

on the ROC space). Now consider a classifier B that negates the decision of A, that is, whenever A

predicts the positive class then B predicts the negative class and vice versa. Compute the false-positive

and true-positive rates fpB,tpB for classifier B. Where is this point in the ROC space?

Exercise 6: least-squares regression (14 points). Consider the following model, with parameters

Θ = {θ1, θ2, θ3} ⊂ R and an input x ∈ R:

h(x; Θ) = θ1 + θ2 sin 2x + θ3 sin 4x ∈ R.

1. (2 points) Write the general expression of the least-squares error function of a model h(x; Θ) with

parameters Θ given a sample {(xn, yn)}

N

n=1.

2. (2 points) Apply it to the above model, simplifying it as much as possible.

3. (6 points) Find the least-squares estimate for the parameters.

4. (4 points) Assume the values {xn}

N

n=1 are uniformly distributed in the interval [0, 2π]. Can you find

a simpler, approximate way to find the least-squares estimate ? Hint: approximate 1

N

PN

n=1 f(xn)

by an integral.

Exercise 7: maximum likelihood estimate (15 points). A discrete random variable x ∈ {0, 1, 2 . . . }

follows a Poisson distribution if it has the following probability mass function:

p(x; θ) = e

−θ

θ

x

x!

where the parameter is θ > 0.

1. (2 points) Verify that P∞

x=0 p(x) = 1.

2. (2 points) Write the general expression of the log-likelihood of a probability mass function p(x; Θ)

with parameters Θ for an iid sample x1, . . . , xN .

3. (5 points) Apply it to the above distribution, simplifying it as much as possible.

4. (6 points) Find the maximum likelihood estimate for the parameter θ.

Exercise 8: multivariate Bernoulli distribution (20 points). Consider a multivariate Bernoulli

distribution where θ ∈ [0, 1]D are the parameters and x ∈ {0, 1}

D the binary random vector:

p(x; θ) = Y

D

d=1

θ

xd

d

(1 − θd)

1−xd

.

1. (5 points) Compute the maximum likelihood estimate for θ given a sample X = {x1, . . . , xN }.

Let us do document classification using a D-word dictionary (element d in xn is 1 if word d is in document

n and 0 otherwise) using a multivariate Bernoulli model for each class. Assume we have K document

classes for which we have already obtained the values of the optimal parameters θk = (θk1, . . . , θkD)

T

and prior distribution p(Ck) = πk, for k = 1, . . . , K, by maximum likelihood.

2. (2 points) Write the discriminant function gk(x) for a probabilistic classifier in general (not necessarily Bernoulli), and the rule to make a decision.

3. (5 points) Apply it to the multivariate Bernoulli case with K classes. Show that gk(x) is linear

on x, i.e., it can be written as gk(x) = wT

k x + wk0 and give the expression for wk and wk0.

4. (3 points) Consider K = 2 classes. Show the decision rule can be written as “if wT x + w0 > 0

then choose class 1”, and give the expression for w and w0.

5. (5 points) Compute the numerical values of w and w0 for a two-word dictionary where π1 = 0.7,

θ1 = ( 0.2

0.8

) and θ2 = ( 0.3

0.6

). Plot in 2D all the possible values of x ∈ {0, 1}

D and the boundary

corresponding to this classifier.

Exercise 9: Gaussian classifiers (10 points). Consider a binary classification problem for x ∈ R

D

where we use Gaussian class-conditional probabilities p(x|C1) ∼ N (µ, σ2

1

I) and p(x|C2) ∼ N (µ, σ2

2

I).

That is, they have the same mean and the covariance matrices are isotropic but different. Compute the

expression for the class boundary. What shape is it?