Question 1: Designing a Bayesian Network
Marv the cat is having a bad day. His brother Harry ate all the food set out by their owner, Shannon, so
Marv has to find a way to feed himself. He can try to catch a fish in the lake outside, and if he succeeds,
he’ll eat it. If it’s a hot day, Marv will be sluggish, so he’s less likely to catch a fish. Marv can also try to
steal Shannon’s sandwich, which does not depend on the outside temperature. However, even if he succeeds
in stealing the sandwich, he might not get to eat it (for example, Shannon may notice and snatch it back).
Finally, if Marv manages to eat at least something, he might feel content, in spite of everything. However, if
it’s hot out, he is less likely to feel content in general.
Consider the Boolean variables: H (it’s a hot day), C (Marv is content), E (Marv eats at least one item), F
(Marv catches a fish), S (Marv steals the sandwich).
a. Draw a Bayesian network for this domain. Only include the Boolean variables listed above, so your
network should have 5 nodes.
b. Is your network a polytree? Why or why not? (NB: a polytree is a graph that has no directed or
c. Suppose the probability that Marv catches the fish is x when it’s hot, and y when it is not. Give the
conditional probability table associated with F.
d. Suppose that if Marv catches a fish, he will eat it with probability 1, and if he successfully steals the
sandwich, he will eat it with probability 0.5. If he fails at both hunting and stealing, then he will not eat
anything. Give the conditional probability table associated with E.
e. Suppose Marv is content. Write down the expression for the probability that it is a hot day, in terms of
the various conditional probabilities in the network.
Question 2: Inference in Bayesian Networks
Consider the following Bayesian Network
R: rush hour
B: bad weather
T: traffic jam
We denote random variables with capital letters (e.g., R for “rush hour”), and the binary outcomes with
lowercase letters (e.g., r and ¬ r for “it is rush hour” and “it is not rush hour,” respectively).
The network has the following parameters:
P(t|r, b,a) = 0.98
P(t|r, ¬ b,a) = 0.9
P(t|r,b, ¬ a) = 0.88
P(t|r, ¬ b, ¬ a) = 0.85
P(t| ¬ r,b,a) = 0.5
P(t| ¬ r,b, ¬ a) = 0.4
P(t| ¬ r, ¬ b,a) = 0.6
P(t| ¬ r, ¬ b, ¬ a) = 0.05
P(s|a) = 0.95
P(s| ¬ a) = 0.2
P(a|b) = 0.7
P(a| ¬ b) = 0.2
Compute the following terms using basic axioms of probability and the conditional independence properties
encoded in the above graph.
a. P(a, ¬ r)
b. P(b, a)
For the query P(b|a):
c. Use Bayes Ball to determine the set of nodes that can be pruned from the graph.
d. Compute P(b|a) using the simplifications determined in Part c.
Question 3: Variable Elimination
For the graph above, compute the MAP result of querying P(T|b) using variable elimination with the
following order: S, A, R, T.
Clearly explain each step. For each of the intermediate factors created, explain what probabilistic function it
Question 4: Learning with Bayesian Networks
Consider the following Bayesian network. Assume that the variables are distributed according to Bernoulli
a. We are given the following dataset with 144 samples, from which we will estimate the parameters of the
A B C D # Instances
0 0 0 0 2
0 0 0 1 4
0 0 1 0 18
0 0 1 1 3
0 1 0 0 14
0 1 0 1 2
0 1 1 0 31
0 1 1 1 10
1 0 0 0 1
1 0 0 1 0
1 0 1 0 3
1 0 1 1 23
1 1 0 0 0
1 1 0 1 9
1 1 1 0 14
1 1 1 1 10
i. Enumerate the parameters that must be learned. Specify the parameter name and the probability
that it represents (i.e., for each parameter, write something in the form, θ r . X = P (X)
ii. Give the maximum likelihood estimate for each parameter.
iii. Give the maximum a posteriori (MAP) estimate for each parameter after applying Laplace
b. Assume that in addition to the data in the table above, you are given the following incomplete data
A B C D
S1 1 ? 1 ?
S2 1 1 0 0
We will apply the (soft) EM algorithm on these instances. Initialize the model using your parameter
estimates from Part a., Subpart ii. (i.e., use the MLE).
i. Show the computation of the first E-step, providing the weights for each possible assignment of
the incomplete data for each sample.
ii. What are the parameters obtained for the first M-step? Weight each of the samples from the
original dataset and the two new samples equally (i.e., you now have 146 samples).
iii. Show the computation of the second E-step.