Description
Overview
The purpose of this project is to practice multithreaded programming using Pthreads, Win32 Thread and
OpenMP. There are three independent parts of this project. Please read instructions carefully.
You must submit your project in one zip file (named as firstname_lastname_proj1) to the Project Turn In
Site (https://hwupload.cs.txstate.edu/), which should include all your source code for part1, part2, part 3
and the according doc file for part 2 and part3 questions. You only need to submit a hard copy of the doc
file in class on Oct. 23. Do not print out the code.
The submission system will automatically record your submission time, which will be used to decide
whether or not you miss the deadline. There will be a penalty of 20% per day (including weekend and
holiday) for late submission. The late submission after two days (including weekend and holiday) will not
be accepted. Therefore, DO NOT DELAY. Start writing the program from Day 1. If you wait until the
night before the due date, you will have a miserable night and it is less likely you could complete the
project.
You are encouraged to discuss with other students but you have to DO YOUR OWN WORK. You may
get ZERO for this project or FAIL the class if you copy code from others or allow others to copy your
code.
Part 1: Implement Fibonacci Sequence using Pthreads and Win32 Thread (50 points).
The Fibonacci sequence is the series of numbers 0, 1, 1, 2, 3, 5, 8, …. Formally, it can be expressed as:
That is, after two starting values, each number is the sum of the two preceding numbers. The Fibonacci
numbers for n = 0, 1, 2, … ,20 are:
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
Part 1a (25 points): Write a multithreaded program that generates the Fibonacci sequence using
Pthreads. This program should work as follows: The user will enter on the command line the number of
Fibonacci numbers that the program is to generate. The program will then create a separate thread that
will generate the Fibonacci numbers, placing the sequence in data that can be shared by the threads (an
array is probably the most convenient data structure). When the thread finishes execution, the parent
thread will output the sequence generated by the child thread. Because the parent thread cannot begin
outputting the Fibonacci sequence until the child thread finishes, this will require having the parent thread
wait for the child thread to finish, using the thread join techniques described in class.
Part 1b (25 points): Write a multithreaded program that does the same Fibonacci calculate as stated in
Part 1a using Win32 thread.
Part 2 a: Implement Matrix Multiplication using Pthreads (30 points)
Background of Matrix Multiplication
Given two matrices, A and B, where matrix A contains M rows and K columns and matrix B contains K
rows and N columns, the matrix product of A and B is matrix C, where C contains M rows and N
columns. The entry in matrix C for row i, column j, denoted as Ci,j, is the sum of the products of the
elements for row i in matrix A and column j in matrix B. That is,
∑
For example, if A is a 2-by-3 matrix and B is a 3-by-2 matrix, the result matrix will be a 2-by-2 matrix.
Passing Parameters to Each Thread
The parent thread will create M ×N worker threads, passing each worker the values of row i and column j
that it is to use in calculating the matrix product. This requires passing two parameters to each thread. The
easiest approach with Pthreads is to create a data structure using a struct. The members of this structure
are i and j, and the structure appears as follows:
/* structure for passing data to threads */
Struct v
{
int i ; /* row */
int j ; /* column */
};
The Pthreads program will create the worker threads using a strategy similar to that shown below:
/* We have to create M ×N worker threads */
for ( i = 0; i < M, i++)
for ( j = 0; j < N; j++ ) {
struct v *data = (struct v *) malloc (sizeof (struct v)) ;
data->i = i
data->j = j
/* Now create the thread passing its data as a parameter */
}
}
The data pointer will be passed to the pthread_create (), which in turn will pass it as a parameter to the
function that is to run as a separate thread.
Waiting for Threads to Complete
Once all worker threads have completed, the main thread will output the product contained in matrix C.
This requires the main thread to wait for all worker threads to finish before it can output the value of the
matrix product. Several different strategies can be used to enable a thread to wait for other threads to
finish. In this example, the main thread has to wait for multiple threads to finish before it can continue its
execution.
The following code shows how you could join on ten threads using the Pthreads:
#define NUM_THREADS 10
pthread_t workers[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++)
pthread_join(workers[i], NULL);
For this project, you need to calculate each element Ci,j in a separate worker thread. This will involve
creating M × N worker threads. The main (or parent) thread will initialize the matrices A and B and
allocate sufficient memory for matrix C, which will hold the product of matrices A and B. These matrices
will be declared as global data so that each worker thread has access to A, B, and C.
Matrices A and B can be initialized statically, as shown below:
#define M 3
#define K 2
#define N 3
int A [M] [K] = {{1,4}, {2,5}, {3,6}};
int B [K] [N] = {{8,7,6}, {5,4,3}};
int C [M] [N] ;
Your code should be able to calculate the C = A× B and print out the correct matrix C. I may
change the elements value of A and B to test your code but you can assume I will not change the
value of M, K, and N.
Part 2b: Answer the following questions (20 points)
Now you have generated a multithreaded program which can multiply a 3-by-2 matrix and a 2-by-3
matrix. Please answer the following questions:
1. We can also write a program to do matrix multiplication without using multithreading. What are
the advantages and disadvantages of using multithreading? (5 points)
2. Is your code still applicable for multiplying two 100-by-100 matrices? If yes, why? If no, how
will you change your current code to make it applicable? You only need to write down the key
changes and you can use pseudo code. (5 points)
3. Is your code efficient in the scenario of multiplying giant matrices, say one million by one million
matrices? If yes, why? If not, what are the potential performance problems? (Hint: think about the
overhead of creating thread and data locality issue). How to solve these problems? (10 points)
Part 3: How many 99s (Pthreads and OpenMP)? (50 points).
The following sequential code will count how many 99s in an array.
#include
#include
#include
using namespace std;
int main()
{
int count = 0, i;
int const size = 100000;
int myArray [size];
double start_time, end_time;
//initialize random number generator
srand((unsigned)time(NULL));
// Initialize the array using random numbers
for (i = 0; i < size; i++)
myArray[i] = rand() % 100;
//Serial Code
start_time = omp_get_wtime();
for (i = 0; i < size; i++)
if (myArray[i] == 99)
count ++;
end_time = omp_get_wtime();
printf ("The serial code indicates that there are %d 99s in the array.\n\n", count);
printf ("The serial code used %f seconds to complete the execution.\n\n", end_time - start_time);
return 0;
}
Part 3a (20 points): Write the correct and efficient multithreading code using OpenMP. Record the
execution time of your parallel code and calculate the speedup when using 2 threads, 4 threads and 8
threads. Your score will be based on the correctness and speedup of your parallel code.
Part 3b (30 points): Write the correct and efficient multithreading code using Pthreads. Record the
execution time of your parallel code and calculate the speedup when using 2 threads, 4 threads and 8
threads. Your score will be based on the correctness and speedup of your parallel code.