Project – 6 Bayesian Networks solved

$35.00

Category: You will receive a download link of the .ZIP file upon Payment

Description

5/5 - (1 vote)

Implementation (70 points)

To be done in Python, Java, C or C++

1. (10 points) In your chosen language, implement a data structure(s) that represents a Bayesian Network. This structure(s) will need to represent the nodes, edges, and the conditional probability tables (CPTs) for each node.  You will need a way to represent whether each node is a query variable, evidence variable, or unknown.

2. (20 points + optional 5 extra credit points) . Option A and B only differ on whether you will allow more than 2 parents.

Option A (20 points)

Write a function that takes in a filename as an argument and constructs a Bayesian network using the data structure(s) you defined in the previous problem.  The file will contain several lines with the following format:

node_name: [parent1 parent2] [cpt_1_F cpt_1_T cpt_2_F cpt_2_T …]

You may assume a maximum of 2 nodes as parents in the network, that the file will not contain a cyclic reference (i.e. if you parse the file properly, you will have a proper bayesian network), and that the parents are also contained in the file.  If a node has no parents, the line will appear as follows:

node_name: [] [cpt_F, cpt_T]

The conditional probability table (CPT) for each node is given by the third part of each line ([cpt_1_F …]).  These values are presented in the row order they would occur in the table.  If there are no parents, the CPT will be

P(node_name = F)

P(node_name = T)

cpt_1_F

cpt_1_T

If there is one parent:

P(node_name = F)

P(node_name = T)

Parent = F

cpt_1_F

cpt_1_T

Parent = T

cpt_2_F

cpt_2_T

If there are two parents:

P(node_name = F)

P(node_name = T)

Parent1 = F

Parent2 = F

cpt_1_F

cpt_1_T

Parent1 = F

Parent2 = T

cpt_2_F

cpt_2_T

Parent1 = T

Parent2 = F

cpt_3_F

cpt_3_T

Parent1 = T

Parent2 = T

cpt_4_F

cpt_4_T

Option B (20 points and 5 points of extra credit)

Write a function that takes in a filename as an argument and constructs a Bayesian network using the data structure(s) you defined in the previous problem.  The file will contain several lines with the following format:

node_name: [parent1, parent2, …] [cpt_0, cpt_1, cpt_2, …]

Like option A, the parents of the node are listed in the first set of square brackets for each line.  You may assume that the file contains no cyclic references and that the parents of each node are also contained in the file.  There is no upper bound on the amount of parents possible for each node.

Each value in the second set of square brackets (i.e. cpt_0, cpt_1) represents the probability that the node is true given the assignment of truth values to the parents given by the binary representation of the index number, where parent 1 represents the least significant bit of the binary number and the last parent represent the most significant bit of the binary number.

Explicitly, for three parents (for the first five values provided in the square brackets):

CPT Value index

Binary Representation

Parent 1

Parent 2

Parent 3

0

000

F

F

F

1

001

T

F

F

2

010

F

T

F

3

011

T

T

F

4

100

F

F

T

5

101

T

F

T

3.  (10 points) Write a function that takes in a filename as an argument and assigns the status of each node in your network based on the following format:

t,-,f,?, … (an older version of this project said t,-,f,q, … but as you can see from the text files we are using “?” instead of “q” for the single variable.) Also for the purpose of this project there will always be only one query variable. So you need not worry that there will be a query like (Node2 ^ Node1|Node 5). )

The order of elements on the 1st line corresponds to the order of nodes read in from the previous file.  You may assume there are exactly as many elements on the 1st line as the number of nodes in the previous file.  The values of each element are described below. Note that the “…” indicates that there are more elements in the sequence; the query file will never contain “…”.

‘t’ means this variable is evidence and has been observed as true

‘f’ means this variable is evidence and has been observed as false

‘?’ means this variable is a query variable, we want to know the probability of this variable being true given the evidence: i.e. P(? | E). An earlier version of this said “q” instead of “?”.

‘-’ means this variable is neither the query nor evidence

4. (10 Points) Write a function that takes in the number of samples as an argument and performs rejection sampling on your bayesian network and returns the value.

5. (10 Points) Write a function that takes in the number of samples as an argument and performs likelihood-weighting sampling on your bayesian network and returns the value.

6. (10 Points) Write a main method that parses the two filenames and the number of samples to use from the command line, constructs a bayesian network data structure based on the first file, assigns the status of each node based on the second file, and reports the calculated probability of both rejection and likelihood weighted sampling given the provided number of samples (by printing each probability out).  You may assume that the program will be run in the following manner (for Python).  Java, C, or C++ will follow the same general pattern.

python your_program.py network_file query_file num_samples

or

python3 your_program.py network_file query_file num_samples

Write-up (30 points)

1. (15 points) Run both of your functions on the provided network file (that corresponds to the option you chose in part 2 of the implementation) and two querying files. Perform trials for each query file and sampling method with the following sequence of total samples: 200, 400, 600, 800 and 1000.  Run each sampling method 10 times for each trial (make sure you seed your random number generator differently for every trial!), and report the mean and variance of each trial.  Plot the means of each sampling method against each other for each query file. Plot the variances of each sampling method against each other for each query file.

2. (10 points) Did either method converge to a probability?  Was there any difference in the convergence rate? If so, for each case, state which algorithm converged faster and explain why.

3. (5 points) Write a readme file that documents any external libraries or design choices you made in your project, as well as the language you used and compilation instructions if necessary.  Include any information that may be helpful for the TA while grading.  Make sure you document whether you chose option A or option B for part 2 of the implementation!

Extra Credit

(5 points)  In the write-up portion above, we asked you to determine the probability of each query with a predetermined amount of samples.  This will not always guarantee that the query probability will converge to the true value with the number of samples used.  Come up with a convergence test for each sampling method (the convergence test could be the same for both).  Re-run the experiments for part 1 of the write-up until each sampling method converges, and generate a plot of the mean probability for each number of samples.  Identify another query (separate from query 2) that leads to likelihood weighting significantly faster.  Plot these results as well.

(5 points) For this portion of the extra credit, you should use the provided network to try and determine the conditional probability tables.  You will have to write a function that takes in a number and produces that number of samples of the network using prior-sampling.  Using these samples, perform MLE (maximum likelihood estimation) to estimate the conditional probability table for each node.  Warning! This is not trivial in the general case.  You only have to do this for network file corresponding to the option you chose for part 2 of the implementation.  Pick a sequence of samples (similar to part 1 of the write-up).  For these numbers of samples, estimate the conditional probability tables for every node in the network.   For every number of samples you tried, plot the percent error versus the true probability for every value in the conditional probability table corresponding to the given node being true for every node in the network.  Include these reports in your write-up.   A starting resource for MLE is http://www.cse.ust.hk/bnbook/pdf/l06.h.pdf  .  Keep in mind that this is a very technical resource.  Slides 30-38 describe MLE for a general Bayesian network, slides 8-16 talk about what MLE is.

Hint from Prof Heffernan : Russell and Norvig 20.2.1 make MLE sound hard to do. They tell you that you have to 1) write down the MLE, 2) take the derivatives of the log of the likelihood and 3) find the parameters values such that derivatives are set to zero. This reading makes it not seem scary. For this assignment you can just count and divide! What you are being asked to do is with complete data. If you really want to impress me and earn even another 10 extra credit points implement your own version of EM that will work even when some of the data is randomly missing. I suggest you make it so that 10% or 30% of each of your data points are missing. Then try to do something like with the Neilson reading (the better one!) on page 346



Hints, Tips and Caveats

– It is entirely possible to do this project without external libraries.

– Be careful not to solely use external libraries for your implementation.  Using a graph or tree data structure rather than implementing your own is permitted [e.g., if you are using python a helpful one may the python networkX (which provides a graph structure). ]   Anything more than that is not.

– Problem 2 is best done in two passes over the file:  one to initialize each node in the network, a second to add the parents of each node and fill out the probability table.  If you plan your data structures well, you will be able to do it one.

– Clean, readable code is significantly better than efficient code. This project is designed to help you understand how sampling bayesian networks works, not how to write really efficient querying routines.

– The files to use have been provided with Unix line endings.  If this isn’t readable, please post on Piazza and we’ll upload the same files with different line endings

Python Primer / Additional Resources

Sampling a Bayesian Network

To sample the values of all nodes in a Bayesian Network, you iteratively go through each node in the network.  Each node starts out without a value assigned.  If a node doesn’t have a value assigned, you recurse through its parents until you reach a node without parents, to which you assign a value by computing a random number between 0 and 1 and comparing it to the CPT of that node.  Using these assigned values, you work your way back through the recursion, assigning values to each node by using the assigned values of the parents to identify which row of the CPT to use.

Rejection Sampling

The general idea of rejection sampling is to sample the Bayesian network using the prior sampling method discussed in textbook.  Any samples that don’t correspond to the evidence variables of the query are discarded.  The probability of the query variable (or variables- for this assisgment we will only asked you to deal with a single query variables) occurring is the number of samples that were not discarded where the query variable (or variables) is true divided by the total number of samples not discarded.

In pseudo code:

Weighted Likelihood Sampling

The general idea of weighted likelihood sampling is similar to rejection sampling.  However, instead of randomly sampling the value of every node related to the query, whenever weighted likelihood sampling encounters a node that is an evidence variable it uses the value of the variable from the evidence and updates the weight of the sample.  This allows weighted likelihood sampling to estimate queries where the probability of the evidence variables occurring is very small.

In pseudocode:

Professor Heffernan: Note to self: Give out a few smaller network and give out correct values for some query so the student can test their code. Make sure some of them have some very small values in the CPT so they are more likely to see differences