# B561 Assignment 7 solved

\$35.00

## Description

5/5 - (1 vote)

1 Key-value Stores (MapReduce and Spark)
1. Consider the document “MapReduce and the New Software Stack” available in the module on MapReduce.1
In that document, you can, in Sections 2.3.3-2.3.7, find descriptions of algorithms to implement relational
algebra operations in MapReduce. (In particular, look at the mapper and
reducer functions for various RA operators.)
In the following problems, you are asked to write MapReduce programs
that implement some RA expressions in PostgreSQL. In addition, you
need to add the code which permits the PostgreSQL simulations for these
MapReduce programs. A crucial aspect of solving these problem is to
develop an appropriate data representation for the input to these problems. Recall that, in MapReduce, the input is a single binary relation of
(key,value) pairs. Take for example Problem 1c. To solve this problem,
you need to find a representation that puts both the data from R and
from S in the same binary key-value relation. And, for problem 1d, you
need to have a representation that put all the data from R, S, and T in
the same binary key-value relation.
(a) Write, in PostgreSQL, a basic MapReduce program, i.e., a mapper
function and a reducer function, as well as a 3-phases simulation that
implements the projection πA,B(R) where R(A, B, C) is a relation.
You can assume that the domains of A, B, C are integer. (Recall
that in a projection, duplicates are eliminated.)
(b) Write, in PostgreSQL, a basic MapReduce program, i.e., a mapper
function and a reducer function, as well as a 3-phases simulation that
implements the set difference of two unary relations R(A) and S(A),
i.e., the relation R − S. You can assume that the domain of A is
integer.
(c) Write, in PostgreSQL, a basic MapReduce program, i.e., a mapper
function and a reducer function, as well as a 3-phases simulation that
implements the semijoin RnS of two relations R(A, B) and S(B, C).
You can assume that the domains of A, B, and C are integer.
(d) Let R(A), S(A), and T(A) be unary relations that store integers.
Write, in PostgreSQL, a MapReduce program that implements the
RA expression R − (S ∪T). Also provide a simulation that evaluates
this program.
2. Write, in PostgreSQL, a basic MapReduce program, i.e., a mapper function and a reducer function, as well as a 3-phases simulation that implements the SQL query
1This is Chapter 2 in Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman,
and Jeffrey D. Ullman.
1
SELECT r.A, array_agg(r.B), cardinality(array_agg(r.B))
FROM R r
GROUP BY (r.A)
HAVING COUNT(r.B) >= 2;
Here R is a relation with schema (A, B). You can assume that the domains
of A and B are integers.
3. Let R(K, V ) and S(K, W) be two binary key-value pair relations. Consider the cogroup transformation R.cogroup(S) introduced in the lecture
on Spark. You can assume that the domains of K, V , and W are integers.
(a) Define a PostgreSQL view that represent the co-group transformation
R.cogroup(S). Show that this view works.
(b) Write a PostgreSQL query that use this cogroup view to compute
RnS.
(c) Write a PostgreSQL query that uses this cogroup view to implement
the SQL query
SELECT distinct r.K as rK, s.K as sK
FROM R r, S s
WHERE ARRAY(SELECT r1.V
FROM R r1