Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Machine Learning - Project Objective. In this project you will have to: Design Implement Evaluate and Optimize a machine learning model to be applied for

Machine Learning - Project
Objective. In this project you will have to:
Design
Implement
Evaluate and Optimize
a machine learning model to be applied for Sentiment Classification.
Learning Outcomes. Through this project, the student will get familiar with the experimental
evaluation of machine learning models, pre-processing of data, writing a short technical report,
selecting parameters of a model, combining classifiers, use of appropriate libraries (RapidMiner,
Python), and dealing with a real-world machine learning problem.
Teams. This project can be done individually or in teams of two or three members.
Data. You can find the data uploaded to moodle. The file includes a set of documents marked with
the following labels: positive, neutral, negative. The data are in raw format (text) so
you have to convert them to the form that the algorithms can process. This is part of the
pre-processing step.
Goal. You have to train, optimize and evaluate the models on these data in order to get the best
possible predictive performance in new, unknown data.
Experimenting and Selecting your model. You are provided with a file named train.csv. Use this
to select the best model for this type of data in your opinion. For that, you need to experiment with
multiple algorithms (that we have seen in class) and parameters (as we have seen in class) for each
algorithm and with a proper evaluation process (train-test, cross-validation, or whatever you think is
best). This experimentation should be presented in your Technical Report.
Preparing your model for evaluation. After you decide on the best model, you need to prepare it to
classify some new, unknown data. The new data will not be provided to you. The instructors should
be able to use your model in order to classify new data. The instructors will have a test.csv file which
will have an identical format as the train.csv file, but with no labels (that column will be missing).
Your model (process) should load that data, classify and store the decision in a
predictions.txt file. One prediction per line.
The predictions.txt file should look like this:The instructors, having:
a) The predictions.txt of your model.
b) The actual (correct) labels of the new data (that you don't have).
Will calculate the accuracy of your approach.
Based on the accuracy, your approach will be ranked in comparison to the other teams in the course.
This ranking will affect one part of your project grade (see below).
Deliverables
For this project, you must submit on Moodle (only one member of the team):
Your trained model.
a. RapidMiner
i. The .rmp file of your process (train_test.rmp). This process will be
used by the evaluators. The process should load the train.csv.
Instructors should be able to use this process to load a
test.cv (with the
unknown data, as described above) and generate your
predictions.txt file.
ii. The .rmp file of the process you used for evaluating the model.
(experiments.rmp). If you have more than one process that you used
for your experiments you can upload multiple files (experiments 1.rrmp,
experiments 2. rmp, etc).
b. Python - you should provide the files with all your code. One of your files should be
named
main.py. In this file, you should have a function that will be named
train_test(train.csv,test.csv) taking as parameters the
train.csv file (that is provided to you) and the test.csv file (that is not
provided to you), so that the evaluators can easily run it and get results (i.e. the
predictions.txt file).
c. RapidMiner and Python:
If you think it is a good idea to add a file with instructions on how to run
your process/code to the instructors please add a readme.txt file to your
submission.
Obviously, apart from the files above that are required for the final
evaluation, you will have to write code or create processes that will help you
identify what is the best model. The code (or the processes) of these
experiments should be submitted separately and named
experiments.py or experiments.rmp)
A technical report
Describe in 4 pages how you have decided to use the model you have selected. What
experiments you did, which models you tried out, your observations, how you did
the evaluation, why you selected that particular setting in the end and whatever else
you think is necessary. You are advised to professionally present your results with
tables and plots.
Marks - Overview: Total: 100 points, out of which:
A.14 points: pre-processing
B.50 points: model evaluation (experiments, tuning, argumentation, links with theory, etc.)
C.20 points: quality of technical report (presentation of results, plots, tables, etc.)
D.16 points: your rank
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Time Series Databases New Ways To Store And Access Data

Authors: Ted Dunning, Ellen Friedman

1st Edition

1491914726, 978-1491914724

More Books

Students also viewed these Databases questions