Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For this assignment, you will carry out a binary classification task, and write a report on this. The data come from photos, and your task

For this assignment, you will carry out a binary classification task, and write a report on this.
The data come from photos, and your task is to come up with a machine learning method for classifying the photos according to whether their content is happy or sad. The data you are given for each photo consists of 3456 features. 3072 of these were extracted from a deep Convolutional Neural Network (CNN)[1], and the remaining 384 are gist features [2].(You are given all these features as a 1-dimensional array, so you will not be performing any feature extraction on raw images.)
There are two files of training data. The first contains 400 samples with all the data present (no missing or null values). The second contains 2750 samples, which have some missing data, as indicated by a NaN (not a number). The training data have class labels, 1 for happy, and 0 for sad. In addition, there is also a confidence label for each sample. The class labels were assigned based on decisions from 3 people viewing the photos. When they all agreed, the class label could be considered certain, and a confidence of 1 was written down. If they didn't all agree, then the classification decided on by the majority was assigned, but with a confidence of only 0.66.
There is one file of test data, containing 1000 samples. You must generate predictions for the class labels of these data. (Note that, as with the second training set, the samples in the test data set contain some missing features.)
Your job is to obtain the best predictions you can, and to justify your methods. You should provide reasons for which classifier or combination of classifiers you use, how you do model selection (training-validation split or cross validation), and how you handle the specific issues with these data (large number of features, missing data, the presence of confidence labels for the classes of the training data). We value creative approaches!
You may make use of any classifier, such as: single-layer perceptron, multi-layer perceptron, SVM, random forest, logistic regression, etc. You are not required to code classifiers from scratch, and you can use any machine learning toolbox you like, such as scikit-learn

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

A Complete Guide To Data Science Essentials

Authors: Miguel

1st Edition

9358684992, 978-9358684995

More Books

Students also viewed these Databases questions

Question

Make efficient use of your practice time?

Answered: 1 week ago