Sale!

EL9123 Homework 2: Multiple Linear Regression solved

Original price was: $35.00.Current price is: $35.00. $21.00

Category: You will receive a download link of the .ZIP file upon Payment

Description

5/5 - (1 vote)

1. An online retailer like Amazon wants to determine which products to promote based on
reviews. They only want to promote products that are likely to sell. For each product, they
have past sales as well as reviews. The reviews have both a numeric score (from 1 to 5) and
text.
(a) To formulate this as a machine learning problem, suggest a target variable that the online
retailer could use.
(b) For the predictors of the target variable, a data scientist suggests to combine the numeric
score with frequency of occurrence of words that convey judgement like “bad”, “good”,
and “doesn’t work.” Describe a possible linear model for this relation.
(c) Now, suppose that some reviews have a numeric score from 1 to 5 and others have a score
from 1 to 10. How would change your features?
(d) Now suppose the reviews have either (a) a score from 1 to 5; (b) a rating that is simply
good or bad; or (c) no numeric rating at all. How would you change your features?
(e) For the frequency of occurrence of a word such as “good”, which variable would you
suggest to use as a predictor: (a) total number of reviews with the word “good”; or (b)
fraction of reviews with the word “good”?
2. Suppose we are given data:
xi1 0 0 1 1
xi2 0 1 0 1
yi 1 4 3 7
(a) Write an equation for a linear model for y in terms of x1 and x2.
(b) Given the data compute the least-squares estimate for the parameters in the model.
3. An automobile engineer wants to model the relation between the accelerator control and
the velocity of the car. The relation may not be simple since there is a lag in depressing
the accelerator and the car actually accelerating. To determine the relation, the engineers
measures the acceleration control input xk and velocity of the car yk at time instants k =
0, 1, . . . , T − 1. The measurements are made at some sampling rate, say once every 10 ms.
The engineer then wants to fit a model of the form
yk =
X
M
j=1
ajyk−j +
X
N
j=0
bjxk−j + k, (1)
1
for coefficients aj and bj . In engineering this relation is called a linear filter and it statistics
it is called an auto-regressive moving average (ARMA) model.
(a) Describe a vector β with the unknown parameters. How many unknown parameters are
there?
(b) Describe the matrix A and target vector y so that we can rewrite the model (1) in matrix
form,
y = Aβ + .
Your matrix A will have entries of yk and xk in it.
(c) (Graduate students only) Show that, for T  N and T  M, the coefficients of
(1/T)ATA and (1/T)ATy can be approximately computed from the so-called autocorrelation functions
Rxy(`) = 1
T
T
X−1
k=0
xkyk+`
, Ryy(`) = 1
T
T
X−1
k=0
ykyk+`
, Rxx(`) = 1
T
T
X−1
k=0
xkxk+`
,
In the sum, we take xk = 0 or yk = 0 whenever k < 0 or k ≥ T.
4. In audio processing, one often wants to find tonal sounds in segments of the recordings. This
can be formulated as follows: We are given samples of an audio segment, xk, k = 0, . . . , N −1,
and wish to fit a model of the form,
xk ≈
X
L
`=1
a` cos(Ω`k) + b` sin(Ω`k), (2)
where L are a number of tones present in the audio segment; Ω` are the tonal frequencies and
a` and b` are the coefficients.
(a) Show that if the frequencies Ω` are given, we can solve for the coefficients a` and b` using
linear regression. Specifically, rewrite the model (2) as x ≈ Aβ for appropriate x, A and
β. Then describe exactly how we obtain the coefficients a` and b`
from this model.
(b) Now suppose the frequencies Ω` were not known. If we had to solve for the parameters
a`
, b` and Ω`
, would the problem be a linear regression problem?
2