Answered step by step
Verified Expert Solution
Question
1 Approved Answer
This assignment expects you to make a use of multiple machine learning algorithms to make predictions from the following datasets. Refer to our lecture notes
This assignment expects you to make a use of multiple machine learning algorithms
to make predictions from the following datasets. Refer to our lecture notes and
practicals to use appropriate algorithms as per the outline defined.
Datasets
The dataset filmcollectiondataset.csv contains information about movies and
their marketing, production expense, budget of the movie, length of the movie, critic
rating etc and the money earned.
The dataset loandataset.csv contains peoples personal information and a
classification field loanstatus states whether or not their request to loan was
approved based on their education, income and credit score.
The dataset marketingcampaigndataset.csv contains data about peoples
education, marital status, income, number of kids in the household etc and their
preferences to multiple products and their binary response acceptancerejection to
multiple offers made in campaigns from columns AcceptedCmp to AcceptedCmp
and the response column The dataset also contains information about the amount of
money spent on products such as Gold, Fruits, Meat, Fish, Sweets and Wines in the
last two years.
Outline
Create optimum trainingtesting split to form appropriate machine learning
models for both classification and regression problems and also make a use of
cross validation methods to avoid model overfitting problems.
Achieve necessary data preprocessing steps including outlier removals and
appropriate visualisation steps such as pair plots or correlation matrix to better
understand the data distribution.
Create Linear and Multiple regression models to predict the revenue of movies
by proposing unseen input data by keeping in mind the concept of
multicollinearity. Also calculate the coefficient of determination rR
squared Perform the analysis with and without data standardisation to
differentiate the prediction effect.
Make a decision tree model for a regression problem using optimum training
testing split and calculate the Mean Squared Error MSE to check the models
accuracy. Also predict some unseen movies data and compare the models
accuracy against the regression model to find out which model performs better.
Train the Logistic Regression and Decision Tree models with optimum
traintest split for solving classification problems using GridSearch
Hyperparameter tuning to predict whether or not a loan of a certain profiles of
individuals would be approved. Also employ the Random Forest classification
model to predict the class of the same unseen data calculating the accuracy of
the model and compare the results with the Logistic Regression and Decision
Tree models and evaluate your analysis.
Use the same classification models for the Marketing campaign dataset and
predict whether individual profiles with certain characteristics such as marital
status, income or education level is likely to respond to the campaigns made.
Also use the Kmeans clustering algorithm to identify clusters of people with
certain characteristics such as education, marital status or income level and
the money they spent on products like Gold, Fruits, Meat, Fish, Sweets and
Wines etc.
In your report, show appropriate visualisations, confusion matrix and
classification report for each classification model wherever necessary.
Write me the codes for each.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started