## Description

Problem 1 (Values). Name one or two of your own personal, academic, or career values, and

explain how you hope machine learning can be of service to those values.

1

Problem 2 (Stuff you must know). The course website http://www.cs.columbia.edu/~djhsu/

coms4771-f16/ has information about the course prerequisites, course requirements, academic rules

of conduct, and other information. You are required to understand this information and abide by

the rules of conduct, regardless of whether or not you can solve the following problems.

(a) True or false: I may share my homework write-up or code with another student as long as

(1) the write-up only contains solutions for at most half of the problems, (2) the code is at

most five lines, and (3) we list each other as discussion partners on the submitted write-up.

(b) True or false: I may use any outside reference material to help me solve the homework

problems as long as I appropriately acknowledge these materials in the submitted write-up.

2

Problem 3 (More stuff you should know). We’ll use the notation f : X → Y to declare a function

f whose domain is the set X , and whose range is the set Y. For example, f : R → R declares

a real-valued function over the real line. For a positive integer d, the d-dimensional vector space

called Euclidean space is denoted by R

d

. For positive integers m and n, the space of m×n matrices

over the real field R is denoted by R

m×n

. Every matrix in R

m×n

can be regarded as a linear map

from R

n

to R

m.

Let A, B ∈ R

2×2 be given by

A :=

”

1 2

2 4#

, B :=

”

4 0

0 4#

.

(“:=” is the notation used for “equals by definition”.) Also let u := (2, 1) and v := (1, 2), which

are vectors in R

2

. Note that when we refer to vectors from Euclidean spaces in the context of

matrix-vector products, we always regard vectors (like u) as column vectors, and their transposes

(like u

>) as row vectors:

u =

”

2

1

#

, u

> =

h

2 1i

.

(a) What is the rank of A?

(b) What is Au + Bv?

(c) What is u

>Av?

(d) The (Euclidean) norm (or length) of a vector x = (x1, x2, . . . , xd) ∈ R

d

is denoted by kxk2,

and is equal to q

x

2

1 + x

2

2 + · · · + x

2

d

. What is kuk2?

(e) Let f : R

2 → R be the function defined by

f(x) := x

>

(A + B)x .

The gradient of a real-valued function g : R

d → R at a point z ∈ R

d

, denoted by ∇g(z), is

the vector λ = (λ1, λ2, . . . , λd) where

λi

:=

∂

∂xi

g(x)

x=z

for all i = 1, 2, . . . , d .

What is ∇f(v)?

(f) The unit circle in R

2

is the set of vectors in R

2 with unit length, i.e., {x ∈ R

2

: kxk2 = 1}.

Which vector in the unit circle minimizes f (defined above), and what is the value of f

evaluated at this vector? (Hint: think about eigenvectors.)

3

Problem 4 (Random stuff you should know). A (discrete) probability space is a pair (Ω, P), where

Ω is a (discrete) set called the sample space, and P : Ω → R is a real-valued function on Ω called

the probability distribution, which must satisfy P(ω) ≥ 0 for all ω ∈ Ω, and P

ω∈Ω P(ω) = 1. An

event A is a subset of Ω, and the probability of A, denoted by P(A) (somewhat abusing notation),

is equal to P

ω∈A P(ω).

(a) A fair coin is tossed three times. Consider the three events:

• A: the outcome of the first toss is heads.

• B: the outcome of the second toss is tails.

• C: the outcomes of all three tosses are the same.

• D: exactly one of the outcomes is heads.

Which of the following pairs of events are independent?

• A and B.

• A and C.

• A and D.

• C and D.

(b) A student applies to two schools: Trump University and Columbia University. The student

has a probability of 0.5 of being accepted to Trump, and a probability of 0.3 of being accepted

to Columbia. The probability of being accepted by both is 0.2. What is the probability that

the student is accepted to Columbia, given that the student is accepted at Trump?

A random variable (r.v.) on (Ω, P) is a real-valued function X : Ω → R. The notation X ∼ P

declares the r.v. X and associates it with the probability distribution P. (We’ll often leave the

probability space implicit.) The expected value (a.k.a. expectation or mean) of X, written E(X), is

the average value of X under the distribution P:

E(X) :=

X

ω∈Ω

X(ω) · P(ω).

An equivalent definition of E(X) is E(X) :=

P

x

x · P(X = x), where the summation is taken over

all x in the range of X, and P(X = x) is shorthand for P({ω ∈ Ω : X(ω) = x}).

(c) Consider the sample space Ω = {1, 2, . . . , 6} × {1, 2, . . . , 6}, and let P be the uniform distribution over Ω, i.e., P(a, b) = 1/36 for each (a, b) ∈ Ω. Let X be the random variable defined

by X(a, b) = min{a, b} for each (a, b) ∈ Ω.

For each x ∈ {1, 2, . . . , 6}, what is P(X = x)?

(d) Continuing from (c), what is the expected value of X?

(e) A biased coin with P(heads) = 1/5 is tossed repeatedly until heads comes up. What is the

expected number of tosses?

(f) You create a random sentence of length n by repeatedly picking words at random from the

vocabulary {a, is, not,rose}, with each word being equally likely to be picked. What is the

expected number of times that the phrase “a rose is a rose” will appear in the sentence?

4

Problem 5 (More random stuff you should know). We often encounter probability spaces (Ω, P)

where Ω is not a discrete set. In this class, the only random variables we’ll consider on such spaces

will either have a discrete image (i.e., {X(ω) : ω ∈ Ω} is a discrete set) or have a probability density

function p: R → R, which is a non-negative real-valued function on R such that, for any open

interval (a, b) = {x ∈ R : a < x < b} ⊆ R,
P(X ∈ (a, b)) = P({ω ∈ Ω : X(ω) ∈ (a, b)}) = Z
(a,b)
p(x) dx .
Random variables with probability density functions will be called continuous random variables.
(a) Let X be a continuous random variable with probability density function p given by
p(x) :=
(
0 if x < 0 ,
λe−λx if x ≥ 0 .
Here, λ is a positive number (typically called the rate parameter). If P(X ≤ 1000000) = 0.5,
then what is the value of λ?
(b) Let X be a standard normal random variable, i.e., a continuous random variable whose density
is the standard normal density p(x) := e
−x
2/2/
√
2π for all x ∈ R. Define the random variable
Y on the same probability space as X by Y := X2
, i.e., Y (ω) := X(ω)
2
for all ω ∈ Ω. What
are E(X) and E(Y )?
A collection of continuous random variables X1, X2, . . . , Xd, all defined on the same probability
space, has a (joint) probability density function p: R
d → R if, for any A ⊆ R
d
,
P((X1, X2, . . . , Xd) ∈ A) = Z
A
p(x1, x2, . . . , xd) dx1 dx2 · · · dxd .
We’ll often collect several random variables, such as X1, X2, . . . , Xd, into a random vector X =
(X1, X2, . . . , Xd). So the equation above can be written as P(X ∈ A) = R
A
p(x) dx.
(c) Suppose the pair of random variables (X1, X2) has probability density function p given by
p(x1, x2) :=
(
c if 0 ≤ x1 ≤ 0.5 and 0 ≤ x2 ≤ 1 ,
0 otherwise .
Here, c is a constant (that does not depend on x1 or x2). What should be the value of c so
that p is a valid probability density function?
(d) Continuing from (c), what is the probability that X2 ≥ X1?
(e) Continuing from (c), define another random variable Y on the same probability space as X1
and X2 by
Y :=
(
1 if X1 > 2X2 ,

−1 otherwise .

Are X1 and Y independent? What is the expected value of Y ?

(f) Continuing from (c), define yet another random variable Z on the same probability space as

X1 and X2 by

Z :=

(

1 if X2 > 1/2 ,

−1 otherwise .

Are X1 and Z independent? What is the expected value of X1Z?

5

Problem 6 (Google Cloud; optional but recommended). Set up a virtual machine on Google

Cloud. Figure out how to install some useful Python packages like numpy, scipy, scikit-learn,

etc. Download the OCR image data set ocr.mat from Courseworks, and load it into memory:

from scipy . io import loadmat

ocr = loadmat (‘ocr . mat ‘)

This file contains four different matrices called data, labels, testdata, and testlabels. For

example, data represents a 60000×784 matrix, which you can verify using the following command:

ocr [‘data ‘]. shape

Using the numpy and scipy libraries, write some code to compute the average squared Euclidean

norm of the rows of data. The following functions may be useful:

• numpy.apply_along_axis

• numpy.linalg.norm

• numpy.mean

The result should be around 127.642. You don’t need to submit anything for this problem.

6