Project 4: Advanced Relational Algebra Operators (Distinct, Sum, GroupBy)
This project requires the implementation of the remaining three single-table relational algebra operators
(DuplicateRemoval, Sum, and GroupBy).
Implement the GetNext method from the DuplicateRemoval relational operator. You have to use a set-like
data structure. Whenever a record is generated by the child operator, check to see if it appears in the set
data structure. If not, return it to the caller operator and add it to the set. If it appears, ask for another
record from the child operator. Repeat the process until a record can be produced or no more records exist.
The most complicated part is to compare two records in order to find if they are identical or not. Class
OrderMaker already implements this functionality in method Run. It is important to remember that the
DuplicateRemoval operator appears at the top of the tree, above Project, and below WriteOut.
Implement the GetNext method from the Sum relational operator. Apply the Function to every record
produced by the child operator. Keep a running sum that is continuously updated with the result of
Function. When all the records are processed, create the result record containing only the sum and pass it
to the parent operator. Method Apply from Function does all the work.
Implement the GetNext method from the GroupBy relational operator. This is a combination of the
DuplicateRemoval and Sum operators. Replace the set-like data structure in DuplicatRemoval with a
Map having as key the grouping attributes and as value the running sum. OrderMaker over the grouping
attributes allows you to run comparisons between records. For every record produced by the child operator,
check to see if it appears in the map. If yes, compute the function on the aggregate attributes and add the
result to the running sum. If no, add the new grouping attributes to the map and initialize the running sum
with the result of Function applied to the record. Remember that records are produced from GroupBy only
after all the records in the child operator are processed. The order in which you generate the records is not
important. However, remember that a single record is returned at-a-time. The sum aggregate appears in
the first position of the created record, followed by the grouping attributes.
1. Implement GetNext for the operators presented above. There are no modifications required to main.cc
from Project 3 in order to support the new operators.
2. Execute the queries provided with this stage of the project over the TPC-H data you generated and
loaded in Phase 3 of the project.
3. For correctness and performance analysis, compare the results you obtain with the results generated
by some other database server, e.g., SQLite.