CE/CZ4073 : Data Science for Business Assignment 3 solved

$35.00

Category: You will receive a download link of the .ZIP file upon Payment

Description

5/5 - (1 vote)

Problem 1 [10 points]
Background : Consider the attached dataset assign3_CuisineData.json, which stores the
recipes of multiple dishes in JSON format, where the first element of each recipe is a unique
identifier (“id”) and the second element is the list of ingredients (“ingredients”) of the dish:
{ “id”: 10259,
“ingredients”: [ “romaine lettuce”, “black olives”, “grape tomatoes”, “garlic”,
“pepper”, “purple onion”, “seasoning”, “garbanzo beans” ] }
The target is to find the optimal number of clusters (cuisines) that you can spot in the dataset.
Task : Import the dataset in JSON, convert it into a suitable Document-Term Matrix (DTM),
and perform clustering with appropriate choice of algorithm and distance notion to identify the
optimal number of clusters in the dataset assign3_CuisineData.json. Briefly comment (within
the code) on the choices you make in the process of finding the optimal number of clusters.
Problem 2 [10 points]
Background: Every individual has a unique preference for music, movies, hobbies, and interests, sometimes related to their health habits, phobias, personality, lifestyle, spendings, and even
opinions. The Kaggle dataset https://www.kaggle.com/miroslavsabo/young-people-survey
documents the responses from a survey, connecting individual preferences to the demography.
Task : You are required to find the optimal number of clusters that you can spot in the Kaggle
dataset. Briefly comment (within the code) on the choices you make in the process of finding
the optimal number of clusters. Based on the clusters you observe, find the strongest clustering
parameters in this set of people. Do you think that “gender” plays a major role in clustering?
This is an individual assignment. Properly acknowledge every source of information that you
refer to, including discussions with your fellow students, if any. Verbatim copy from any source
is strongly discouraged, and plagiarism will be heavily penalized. It is strongly recommended that
you write the codes completely on your own. Feel free to write the codes in Python if you want.