Name: CSE176 Introduction to Machine Learning Homework set #1 solved
SKU: 966
Price: 35.00 USD
Availability: InStock

Description

5/5 - (1 vote)

Exercise 1: Bayes’ rule (6 points). Suppose that 5% of competitive athletes use performanceenhancing drugs and that a particular drug test has a 2% false positive rate and a 1.5% false negative
rate.
1. (3 points) Athlete A tests positive for drug use. What is the probability that Athlete A is using
drugs?
2. (3 points) Athlete B tests negative for drug use. What is the probability that Athlete B is not
using drugs?
Exercise 2: Bayesian decision theory: losses and risks (11 points). Consider a classification
problem with K classes, using a loss λik ≥ 0 if we choose class i when the input actually belongs to
class k, for i, k ∈ {1, . . . , K}.
1. (2 points) Write the expression for the expected risk Ri(x) for choosing class i as the class for a
pattern x, and the rule for choosing the class for x.
Consider a two-class problem with losses given by the matrix λik =

0 1
λ21 0

.
2. (3 points) Give the optimal decision rule in the form “p(C1|x) > . . . ” as a function of λ21.
3. (3 points) Imagine we consider both misclassification errors as equally costly. When is class 1
chosen (for what values of p(C1|x))?
4. (3 points) Imagine we want to be very conservative when choosing class 2 and we seek a rule of
the form “p(C2|x) > 0.99” (i.e., choose class 2 when its posterior probability exceeds 99%). What
should λ21 be?
Exercise 3: association rules (6 points). Given the following data of transactions at a supermarket, calculate the support and confidence values of the following association rules: meat → avocado,
avocado → meat, yogurt → avocado, avocado → yogurt, meat → yogurt, yogurt → meat. What is the
best rule to use in practice?
transaction # items in basket
1 meat, avocado
2 yogurt, avocado
3 meat
4 yogurt, meat
5 avocado, meat, yogurt
6 meat, avocad
Exercise 4: true- and false-positive rates (10 points). Consider the following table, where xn is
a pattern, yn its ground-truth label (1 = positive class, 2 = negative class) and p(C1|xn) the posterior
probability produced by some probabilistic classification algorithm:
n 1 2 3 4 5
yn 1 2 2 1 2
p(C1|xn) 0.6 0.7 0.5 0.9 0.2
We use a classification rule of the form “p(C1|x) > θ” where θ ∈ [0, 1] is a threshold.
1. (8 points) Give, for all possible values of θ ∈ [0, 1], the predicted labels and the corresponding
confusion matrix and classification error.
2. (2 points) Plot the corresponding pairs (fp,tp) as an ROC curve.
Exercise 5: ROC curves (8 points). Imagine we have a classifier A that has false-positive and
true-positive rates fpA,tpA ∈ [0, 1] such that fpA > tpA (that is, this classifier is below the diagonal
on the ROC space). Now consider a classifier B that negates the decision of A, that is, whenever A
predicts the positive class then B predicts the negative class and vice versa. Compute the false-positive
and true-positive rates fpB,tpB for classifier B. Where is this point in the ROC space?
Exercise 6: least-squares regression (14 points). Consider the following model, with parameters
Θ = {θ1, θ2, θ3} ⊂ R and an input x ∈ R:
h(x; Θ) = θ1 + θ2 sin 2x + θ3 sin 4x ∈ R.
1. (2 points) Write the general expression of the least-squares error function of a model h(x; Θ) with
parameters Θ given a sample {(xn, yn)}
N
n=1.
2. (2 points) Apply it to the above model, simplifying it as much as possible.
3. (6 points) Find the least-squares estimate for the parameters.
4. (4 points) Assume the values {xn}
N
n=1 are uniformly distributed in the interval [0, 2π]. Can you find
a simpler, approximate way to find the least-squares estimate ? Hint: approximate 1
N
PN
n=1 f(xn)
by an integral.
Exercise 7: maximum likelihood estimate (15 points). A discrete random variable x ∈ {0, 1, 2 . . . }
follows a Poisson distribution if it has the following probability mass function:
p(x; θ) = e
−θ
θ
x
x!
where the parameter is θ > 0.
1. (2 points) Verify that P∞
x=0 p(x) = 1.
2. (2 points) Write the general expression of the log-likelihood of a probability mass function p(x; Θ)
with parameters Θ for an iid sample x1, . . . , xN .
3. (5 points) Apply it to the above distribution, simplifying it as much as possible.
4. (6 points) Find the maximum likelihood estimate for the parameter θ.
Exercise 8: multivariate Bernoulli distribution (20 points). Consider a multivariate Bernoulli
distribution where θ ∈ [0, 1]D are the parameters and x ∈ {0, 1}
D the binary random vector:
p(x; θ) = Y
D
d=1
θ
xd
d
(1 − θd)
1−xd
.
1. (5 points) Compute the maximum likelihood estimate for θ given a sample X = {x1, . . . , xN }.
Let us do document classification using a D-word dictionary (element d in xn is 1 if word d is in document
n and 0 otherwise) using a multivariate Bernoulli model for each class. Assume we have K document
classes for which we have already obtained the values of the optimal parameters θk = (θk1, . . . , θkD)
T
and prior distribution p(Ck) = πk, for k = 1, . . . , K, by maximum likelihood.
2. (2 points) Write the discriminant function gk(x) for a probabilistic classifier in general (not necessarily Bernoulli), and the rule to make a decision.
3. (5 points) Apply it to the multivariate Bernoulli case with K classes. Show that gk(x) is linear
on x, i.e., it can be written as gk(x) = wT
k x + wk0 and give the expression for wk and wk0.
4. (3 points) Consider K = 2 classes. Show the decision rule can be written as “if wT x + w0 > 0
then choose class 1”, and give the expression for w and w0.
5. (5 points) Compute the numerical values of w and w0 for a two-word dictionary where π1 = 0.7,
θ1 = ( 0.2
0.8
) and θ2 = ( 0.3
0.6
). Plot in 2D all the possible values of x ∈ {0, 1}
D and the boundary
corresponding to this classifier.
Exercise 9: Gaussian classifiers (10 points). Consider a binary classification problem for x ∈ R
D
where we use Gaussian class-conditional probabilities p(x|C1) ∼ N (µ, σ2
1
I) and p(x|C2) ∼ N (µ, σ2
2
I).
That is, they have the same mean and the covariance matrices are isotropic but different. Compute the
expression for the class boundary. What shape is it?

CSE176 Introduction to Machine Learning Homework set #1 solved

Description

Related products

CSE176 Introduction to Machine Learning Homework set #2 solved