## Description

Your task

For this project you will write several classes to simulate searching through a Twitter-like data

base. Make sure you implement all required methods according to the instructions given below.

In addition to the required methods, you are free to add as many other private methods as you

want. Note that null keys are not allowed in the hash table. Remember that in most of scenarios

objects comparison does not use ‘==’. We highly suggest you read through the entire instructions

before you start coding. You do not necessarily have to implement the methods in the order in

which they are presented. Note also that this project is meant for you to practice first hand how

to implement a hash table as well as learning when to use it and how to exploit its properties. For

this project you want to focus on optimizing the time efficiency of your code. We do not care about

space efficiency. Enjoy coding this last project!

[70 points] The class MyHashTable has the following fields:

• An int for the number of entries stored inside the table.

• An int for the number of buckets the table has (Note that this value is initialized by the

constructor, but could change later on if the number of entries increases).

• A final double representing the maximum load factor for the hash table.

• An ArrayList of buckets used to store the entries to the table. Where each bucket is a

LinkedList of HashPairs.

Implement the following public methods in the MyHashTable class:

• The constructor MyHashTable() which takes an int as input representing the initial

capacity of the table.1 Using the input, the constructor initializes all the fields.

• A put() method that takes a key and a value as input. The method adds a HashPair

of the key and value to the hash table. If a HashPair with the key already exists in the

hash table, then you should overwrite the old value associated with the key with the new

one. This method should be O(1) on average. If in this hash table there was a previous

value associated to the given key, then the method overwrites it with the new value and

returns the old one. If there was no value associated to the given key, then the method

returns null. Remember that the load factor of the table should never be greater than

the maximum load factor stored in the appropriate field.

• A get() method which takes a key as input and returns the value associated with it. If

there is no such key in the hash table, then the method should return null. This method

should run in O(1) on average.

• A remove() method that takes a key as input and removes from the table the entry (i.e.

the HashPair) associated to this key. The method should return the value associated to

the key. If the key is not found, then the method returns null. This method should run

in O(1) on average.

1The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the

time the hash table is created.

2

• A rehash() method which takes no input and modifies the table so that it contains

double the number of buckets. This method should be O(m) where m is the number of

buckets in the table.2

• A keys() method which takes no input and returns an ArrayList containing all the keys

in the table. The keys in the returned ArrayList may be in any order. This method

should be O(m) where m is the number of buckets in the table.

• A values() method which takes no input and returns an ArrayList containing all the

unique values in the table. The returned ArrayList of unique values may be in any

order. This method should be O(m) where m is the number of buckets in the table.

Notice that inside this class you can also see two static methods. The method called

slowSort() is already implemented. It takes as input an object of type MyHashTable where

the values are Comparable. The method returns an ArrayList containing all the keys in the

table, sorted in descending order based on the values they map to. The time complexity of

slowSort is O(n

2

), where n is the number of pairs in the table. Your should implement:

• a method called fastSort which performs the same task as slowSort but with a time

complexity of O(n · log(n)). It is up to you to decide which sorting algorithm you’d like

to implement.

Finally, implement the following methods from the private nested class MyHashIterator:

• The constructor which should be O(m), where m is the number of buckets in the table.

• A hasNext() method which should be O(1) and returns true if the hash table has a

next HashPair.

• A next() method which is also O(1) and returns the next HashPair.

[30 points] Implement the following public methods inside the Twitter class. Note that you are

allowed to add as many private methods and fields as you see fit.

• The constructor Twitter() which takes as input an ArrayList of Tweets and an ArrayList

of Strings denoting the stop words. A stop word is a commonly used word that is ignored by search engines. We suggest you first look at what you need to implement in the

rest of the class, before deciding how to implement the constructor. The time complexity

of this method should be O(n+m) where n is the number of tweets and m is the number

of stop words.

• A method addTweet() which takes a Tweet as input and adds it to the Twitter. This

method should be O(1).

• A method latestTweetByAuthor() which takes a String as input and returns the latest

Tweet the given author has posted. If the author has not posted any tweet, then the

method returns null. This method should be O(1).

2

In the slides we mention that these methods run in O(n + m) where n is the number of entries, and m is the

number of buckets. Note that if you have a good hash function and a max load factor of 0.75, then this is equivalent

to say that the method runs in O(m).

3

• A method tweetsByDate() which takes a String as input representing a date in this

format YYYY-MM-DD. The method returns an ArrayList of Tweets containing all the

tweets posted on that given date. If there are no tweets on the given date, then the

method returns null. This method should be O(1).

• A method trendingTopics() which takes no input and returns an ArrayList of Strings

containing all the words (which are not stop words!) that appear in all the tweets of this

Twitter. The words should be ordered from most frequent to least frequent by counting

in how many tweet messages the words appear. Note that if a word appears more than

once in the same tweet, it should be counted only once. This method should be

O(n ∗ log(n)) where n is the number of tweets. Note that the idea here is that

the method would build a hash table in O(n), and then call the fastSort from

the MyHashTable class which runs in O(n ∗ log(n)). Note that tweet messages have

a limit on the number of words, so iterating through the words in a message increases

the time complexity of the method by a constant. More over this implies that the

number of unique words in a set of tweets is bounded by the number of tweets

times a constant, which means that the function that describes the maximum

number of unique words in n tweets is O(n). Here a couple of hints:

– You can find in the template an helper method called getWords which you can use

to extract an array list of words out of a message. Note that the method separate the

words based on apostrophes and space characters. Characters that are not letters

from the English alphabet are ignored. So, for instance ”@pple!” becomes ”pple”

and ”user521dg” becomes ”userdg”.

– When comparing strings make sure to ignore the case of the characters. You want

to be sure for instance not to miss stop words typed in either upper or lower case.

Words like ”HELP”, ”Help”, ”help”, and ”hElP” should all be considered as the

same word.

If you test trendingTopics with an object of type Twitter created with the list of tweets

from the HashTableTester.java and an empty list as stop words, then these are the

top four trending words: “spirit” with 16 occurrence, “a” with 15 occurrences, “that”

with 8 occurrences, and “time” with 7 occurrences.

4

Here is a table with the breakdown of the points for all the methods you have to implement in this

project:

Method name Correctness Time complexity Total

MyHashTable() 5 – 5

put() 3 7 10

get() 2 6 8

remove() 2 6 8

rehash() 4 6 10

keys() 2 3 5

values() 4 6 10

fastSort() 1 5 6

MyHashIterator class 5 3 8

Twitter() 5 – 5

addTweet() 1 4 5

latestTweetByAuthor() 1 4 5

tweetsByDate() 1 4 5

trendingTopics() 4 6 10

Total 40 60 100

5