Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Machine Learning - Project Objective. In this project you will have to: Design Implement Evaluate and Optimize a machine learning model to be applied for
Machine Learning Project
Objective. In this project you will have to:
Design
Implement
Evaluate and Optimize
a machine learning model to be applied for Sentiment Classification.
Learning Outcomes. Through this project, the student will get familiar with the experimental
evaluation of machine learning models, preprocessing of data, writing a short technical report,
selecting parameters of a model, combining classifiers, use of appropriate libraries RapidMiner
Python and dealing with a realworld machine learning problem.
Teams. This project can be done individually or in teams of two or three members.
Data. You can find the data uploaded to moodle. The file includes a set of documents marked with
the following labels: positive, neutral, negative. The data are in raw format text so
you have to convert them to the form that the algorithms can process. This is part of the
preprocessing step.
Goal. You have to train, optimize and evaluate the models on these data in order to get the best
possible predictive performance in new, unknown data.
Experimenting and Selecting your model. You are provided with a file named train.csv Use this
to select the best model for this type of data in your opinion. For that, you need to experiment with
multiple algorithms that we have seen in class and parameters as we have seen in class for each
algorithm and with a proper evaluation process traintest, crossvalidation, or whatever you think is
best This experimentation should be presented in your Technical Report.
Preparing your model for evaluation. After you decide on the best model, you need to prepare it to
classify some new, unknown data. The new data will not be provided to you. The instructors should
be able to use your model in order to classify new data. The instructors will have a test.csv file which
will have an identical format as the train.csv file, but with no labels that column will be missing
Your model process should load that data, classify and store the decision in a
predictions.txt file. One prediction per line.
The predictions.txt file should look like this:The instructors, having:
a The predictions.txt of your model.
b The actual correct labels of the new data that you don't have
Will calculate the accuracy of your approach.
Based on the accuracy, your approach will be ranked in comparison to the other teams in the course.
This ranking will affect one part of your project grade see below
Deliverables
For this project, you must submit on Moodle only one member of the team:
Your trained model.
a RapidMiner
i The rmp file of your process traintest.rmp This process will be
used by the evaluators. The process should load the train.csv
Instructors should be able to use this process to load a
test.cv with the
unknown data, as described above and generate your
predictions.txt file.
ii The rmp file of the process you used for evaluating the model.
experimentsrmp If you have more than one process that you used
for your experiments you can upload multiple files experiments
experiments rmp etc
b Python you should provide the files with all your code. One of your files should be
named
main.py In this file, you should have a function that will be named
traintesttraincsvtest.csv taking as parameters the
train.csv file that is provided to you and the test.csv file that is not
provided to you so that the evaluators can easily run it and get results ie the
predictions.txt file
c RapidMiner and Python:
If you think it is a good idea to add a file with instructions on how to run
your processcode to the instructors please add a readme.txt file to your
submission.
Obviously, apart from the files above that are required for the final
evaluation, you will have to write code or create processes that will help you
identify what is the best model. The code or the processes of these
experiments should be submitted separately and named
experiments.py or experiments.rmp
A technical report
Describe in pages how you have decided to use the model you have selected. What
experiments you did, which models you tried out, your observations, how you did
the evaluation, why you selected that particular setting in the end and whatever else
you think is necessary. You are advised to professionally present your results with
tables and plots.
Marks Overview: Total: points, out of which:
A points: preprocessing
B points: model evaluation experiments tuning, argumentation, links with theory, etc.
C points: quality of technical report presentation of results, plots, tables, etc.
D points: your rank
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started