EE599 Assignment 4: Deep Learning for Classification in Computer Vision solved




5/5 - (1 vote)

The goal of this assignment is to apply deep learning to computer vision. Particularly,
you’ll work with the classification problem on a fashion compatibility dataset called Polyvore.
Your goals will be to set up your category classifier and fashion compatibility classifier. Turn
in your code and report as described in Section 5.
The starter code for this project can be found at
CV-Project but feel free to explore different hyper-parameters and model structures. Follow
the instruction in the readme file to setup the codebase.
1 Dataset Description
Polyvore Outfits [1] is a real-world dataset created based on users’ preferences of outfit
configurations on an online website named items within the outfits that receive
high-ratings are considered compatible and vice versa. It contains a total of 365,054 items
and 68,306 outfits. The maximum number of items per outfit is 19. A visualization of an
outfit is shown in Figure 1.
Figure 1: A visualization of a partial outfit in the dataset. The number at the bottom of
each image is the ID of this item.
2 Category Classification
• The starter code provides the following files with blanks in them and can be read in
this order. First, you need to set your dataset location in (Config[’root path’]).
1. train category.pytrain training scripts
2. CNN classification models
3. dataset preparation
4. utility functions and config
• Training takes place in train model function of train *.py. In each iteration, the model
takes in batches of data provided by the dataloader ( Record your training
acc progress here, which will be used for plotting the learning curve.
• You’re expected to do finetuning and training from scratch.
1. Finetune a model pretrained on ImageNet (e.g, ResNet50). Frameworks nowadays
provide easy access to those, refer to documentations online.
2. Construct a model of your own and start training from scratch.
3. Compare these two models and record the results. What is the advantage of using
a finetuned model? What’s the difference between the learning rates when you
apply these two learning strategies (i.e finetuning vs from scratch)?
Note: images are located in images folder, each image is named by its id. Information
for the item are stored in polyvore item metadata.json.
• Modify to create data pairs (image, category label). Normalization is defined
in get data transforms function.
• Split no less than 10% data for validating your final model. The test set is test category hw.txt.
• Tips:
1. Over-fitting is expected. Play with model structure and hyper-parameter or regularization to reduce over-fitting. You can design any model structure you like.
2. To speed up the training speed, you can set “use cuda” flag to true and increase
the batch size defined in
3. You can restrict the size of the dataset for quick debugging. Set debug=True in You do not necessarily need to use 20 epochs and the entire training set.
4. It may take more than several epochs before the performance plateaus, depending
on the network structure you use and the learning rate.
3 Pairwise Compatibility Prediction
The task is to predict the compatibility of an outfit (Figure 2). It’s essentially a binary
classification problem (compatible or incompatible), however, the difficulty of the task lies
in the input–classify based on a set rather than a single item as you did in the last section.
One idea to deal with set classification is to decompose it into pairwise predictions (you’re
encouraged to propose different ideas for the bonus section). Therefore, you’ll first train a
pairwise compatibility classifier.
Figure 2: Examples of a compatible item and an incompatible item.
• Modify to create a new dataloader that takes in a pair of image inputs (compatible pair and incompatible pair). For example, let’s assume any pair of items in a
compatible outfit are considered compatible whereas an incompatible outfit provides
negative pairs.
• Modify to create a new model that takes in a pair of inputs and outputs a
compatbility probability for this pair.
• Split no less than 10% your data for validation. The test set is test pairwise compat hw.txt.
• Bonus: Make outfit compatibility prediction based on pairwise predictions (i.e, average
over n(n-1)/2 pairwise scores and then set a threshold for outfit compatibility). The
test set is compatibility test hw.txt.
• Tips:
1. Outfit descriptions are located in compatibility *.txt. Each line shows the compatibility of the outfit and its items. (e.g, 1 210750761 1 210750761 2 210750761 3:
a compatible outfit id, whose outfit id is 210750761. It has three items, indexing
from 1 to 3).
2. Item ids and descriptions are located in polyvore item metadata.json. Their corresponding images are also indicated by item ids.
3. To associate items in an outfit with their item id, you need to parse train.json/val.json.
4 Extra Bonus
Considering this is a real-world data and that fashion compatibility prediction is an open
problem. You’re encouraged to refine the performance by adding various tricks.
• Perform learning rate scheduling
• Perform data augmentation to increase robustness
• Perform hard-negative mining for pairwise compatibility prediction
• Learn a permutation invariant feature for the entire set for compatibility prediction
5 Turning In
• Submitting the PDF Make a PDF report containing: answers for what the question
marks in this instruction and a table of results for categorization accuracy and pairwise
compatibility accuracy (Extra Bonus: outfit compatibility accuracy).
• Submitting the code The folder with all python files.
1. Plot for the model structure you designed (category-model.png/compatibilitymodel.png).
2. Learning curves during training of category classifier and compatibility classifier
3. A category.txt file containing two columns: Items ID and predicted category.
4. A pair compatibility.txt with three columns: item1, item2 and prediction score.
5. Extra Bonus: A outfit compatibility.txt file containing two columns: outfit ID for
the test set and predicted compatibility.
[1] Mariya I Vasileva, Bryan A Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, and
David Forsyth. Learning type-aware embeddings for fashion compatibility. In Proceedings of
the European Conference on Computer Vision (ECCV), pages 390–405, 2018.