Question

1 Approved Answer

Posted on Sep 22, 2024

you answered to this question until part (iv) and you comment to repost it to answer to the rest of the question. thanks Question 2.

image text in transcribed

you answered to this question until part (iv) and you comment to repost it to answer to the rest of the question. thanks

Question 2. (To be done using "R"). For this question, you will have to work with Pima Indians Diabetes Database data set in R, named 'PimaIndiansDiabetes" from 'mibench' library. The complete data set can be seen by simply typing PimaIndiansDiabetes into the console, however, for the sake of this question, we will be working with subsets of this data frame. As part of your solutions for this question, you will have to print screen or save some output, so I suggest to submit the entirety of this question as a Word or LaTeX document (use \begin{verbatim} Your R output \end{verbatim)) separate to the hand- written document you scan and submit for the previous questions. a) The first step in this question, is to remove the columns that we will not be using in the analysis from the original PimaIndiansDiabetes data set, including the diabetes' variable which contains the outcome (positiveegative), so that all 768 observations in the sample can be thought as being from the same population, i.e. assume we did not know they were separated into diabetes outcomes. To do this, we will create a new data frame, called sample.data which consists of all rows of the the PimaIndians Diabetes data frame but only the columns from 1 to 4, using the following code: > library (mlbench) > data ("Pima IndiansDiabetes) > sample.data sample.data2 summary (pbg (sample.data21,1:4), factor (sample.data21,51), original.names - TRUE, profile.plot - TRUE)) [4 marks) Question 2. (To be done using "R"). For this question, you will have to work with Pima Indians Diabetes Database data set in R, named 'PimaIndiansDiabetes" from 'mibench' library. The complete data set can be seen by simply typing PimaIndiansDiabetes into the console, however, for the sake of this question, we will be working with subsets of this data frame. As part of your solutions for this question, you will have to print screen or save some output, so I suggest to submit the entirety of this question as a Word or LaTeX document (use \begin{verbatim} Your R output \end{verbatim)) separate to the hand- written document you scan and submit for the previous questions. a) The first step in this question, is to remove the columns that we will not be using in the analysis from the original PimaIndiansDiabetes data set, including the diabetes' variable which contains the outcome (positiveegative), so that all 768 observations in the sample can be thought as being from the same population, i.e. assume we did not know they were separated into diabetes outcomes. To do this, we will create a new data frame, called sample.data which consists of all rows of the the PimaIndians Diabetes data frame but only the columns from 1 to 4, using the following code: > library (mlbench) > data ("Pima IndiansDiabetes) > sample.data sample.data2 summary (pbg (sample.data21,1:4), factor (sample.data21,51), original.names - TRUE, profile.plot - TRUE)) [4 marks)