Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Task You are to import and clean the same HealthCareData _ 2 0 2 4 . csv , that was used in the previous assignment.

Task
You are to import and clean the same HealthCareData_2024.csv, that was used in the
previous assignment. Then run, tune and evaluate two supervised ML algorithms (each
with two types of training data) to identify the most accurate way of classifying
malicious events.
Part 1 General data preparation and cleaning
a) Import the HealthCareData_2024.csv into R Studio. This version is the same as
Assignment 1.
b) Write the appropriate code in R Studio to prepare and clean the
HealthCareData_2024 dataset as follows:
i. Clean the whole dataset based on the feedback received for Assignment 1.
ii. For the feature NetworkInteractionType, merge the Regular and
Unknown categories together to form the category Others. Hint: use the
forcats:: fct_collapse(.) function.
iii. Select only the complete cases using the na.omit(.) function, and name the
dataset dat.cleaned.
Briefly outline the preparation and cleaning process in your report and why you
believe the above steps were necessary.
c) Use the code below to generated two training datasets (one unbalanced
mydata.ub.train and one balanced mydata.b.train) along with the testing set
(mydata.test). Make sure you enter your student ID into the command
set.seed(.).
# Separate samples of normal and malicious events
dat.class0<- dat.cleaned %>% filter(Classification == "Normal") # normal
dat.class1<- dat.cleaned %>% filter(Classification == "Malicious") # malicious
# Randomly select 9600 non-malicious and 400 malicious samples using your student
ID, then combine them to form a working data set
set.seed(Enter your Student ID)
rows.train0<- sample(1:nrow(dat.class0), size =9600, replace = FALSE)
rows.train1<- sample(1:nrow(dat.class1), size =400, replace = FALSE)
# Your 10000unbalanced training samples
train.class0<- dat.class0[rows.train0,] # Non-malicious samples
train.class1<- dat.class1[rows.train1,] # Malicious samples
mydata.ub.train <- rbind(train.class0, train.class1)
# Your 19200balanced training samples, i.e.9600 normal and malicious samples e
ach.
set.seed(Enter your Student ID)
6| P a g e
train.class1_2<- train.class1[sample(1:nrow(train.class1), size =9600,
replace = TRUE),]
mydata.b.train <- rbind(train.class0, train.class1_2)
# Your testing samples
test.class0<- dat.class0[-rows.train0,]
test.class1<- dat.class1[-rows.train1,]
mydata.test <- rbind(test.class0, test.class1)
Note that in the master data set, the percentage of malicious events is
approximately 4%. This distribution is roughly represented by the unbalanced
data. The balanced data is generated based on up-sampling of the minority class
using bootstrapping. The idea here is to ensure the trained model is not biased
towards the majority class, i.e. normal events.
Part 2 Compare the performances of different ML algorithms
a) Randomly select two supervised learning modelling algorithms to test against
one another by running the following code. Make sure you enter your student ID
into the command set.seed(.). Your 2 ML approaches are given by myModels.
set.seed(Enter your student ID)
models.list1<- c("Logistic Ridge Regression",
"Logistic LASSO Regression",
"Logistic Elastic-Net Regression")
models.list2<- c("Classification Tree",
"Bagging Tree",
"Random Forest")
myModels <- c(sample(models.list1, size =1),
sample(models.list2, size =1))
myModels %>% data.frame

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro SQL Server Wait Statistics

Authors: Enrico Van De Laar

1st Edition

1484211391, 9781484211397

More Books

Students also viewed these Databases questions

Question

1-4. What is a business model?

Answered: 1 week ago

Question

What has been your desire for leadership in CVS Health?

Answered: 1 week ago

Question

Why should an individual manager be interested in supporting HR?

Answered: 1 week ago