Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Part 1 - General data preparation and cleaning. a) Import the MLDATASET_PartiallyCleaned.xlsxinto R Studio. This dataset is a partially cleaned version of MLDATASET-200000-1612938401.xlsx b) Write
Part 1 - General data preparation and cleaning. a) Import the MLDATASET_PartiallyCleaned.xlsxinto R Studio. This dataset is a partially cleaned version of MLDATASET-200000-1612938401.xlsx b) Write the appropriate code in R Studio to prepare and clean the MLDATASET PartiallyCleaned dataset as follows: i. ii. For How.Many.Times.File.Seen, set all values = 65535 to NA: Convert Threads.Started to a factor whose categories are given by 1= 1 thread started 2 = 2 threads started 3= 3 threads started 4 = 4 threads started 5= 5 or more threads started Hint: Replace all values greater than 5 with 5, then use the factor(.) function. iii. Log-transform Characters.in.URL using the log() function, and remove the original Characters.in.URL column from the dataset (unless you have overwritten it with the log-transformed data) iv. Select only the complete cases using the nagmit() function, and name the dataset MLDATASET.cleaned. Briefly outline the preparation and cleaning process in your report and why you believe the above steps were necessary
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started