Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You are expected to visit data repositories such as Kaggle, UCI Machine learning repository, PROMISE, etc. and extract three different datasets in the areas of

You are expected to visit data repositories such as Kaggle, UCI
Machine learning repository, PROMISE, etc. and extract three
different datasets in the areas of classification, clustering and
regression respectively. Note that the dataset for classification
should be unbalanced [3 marks].
Preprocess the datasets based on the following issues to
address: [5 marks]
o Missing data values
o Duplicate instances
o Outlier detection
o Influential datapoint detection
o Checking normality of the set of features
o Data transformation
o Feature selection
In the case of the unbalanced dataset, consider an appropriate
approach to balance the dataset. Eg. You can use an
oversampling technique such as SMOTE, ADASYN, MAHAKIL,
etc to balance the dataset [6 marks].
Select appropriate learners (at least 2 learners for each dataset)
for the training and validation needs and justify from literature
why those learners are relevant for such datasets [6 marks].
Consider the following approaches for training and validating
the models:
o K-fold cross validation [3 marks]
o Leave-one-out cross validation [3 marks]
o Percentage split, E.g.: 70% for training and 30% for
validation [3 marks]
Consider appropriate evaluation measures (across the
approaches for training and validation) to assess the
performance of the models. Select the best model/learner based
on a good justification [6 marks].
Per results obtained from the prediction, classification and
clustering across the three datasets, provide necessary plots to
aid in visualizing the results obtained [6 marks].
Use a set of hold-out data to predict, classify or cluster into the
right bins and provide a good visuals/plots of the results [4
marks].
Discuss your visualized results [5 marks]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Expert Performance Indexing In SQL Server

Authors: Jason Strate, Grant Fritchey

2nd Edition

1484211189, 9781484211182

More Books

Students also viewed these Databases questions