Question
The Prostate Dataset The prostate dataset comes from a study on 97 men with prostate cancer who were due to receive radical prostatectomy. The data
The Prostate Dataset
The prostate dataset comes from a study on 97 men with prostate cancer who were due to receive radical prostatectomy.
The data contain the following variables:
- lcavol: log(cancer volume in cm3)
- lweight: log(prostate weight in gm)
- age: age in years
- lbph: log(benign prostatic hyperplasia amount)
- svi: seminal vesicle invasion
- lcp: log(capsular penetration)
- Gleason: Gleason score
- pgg45: percentage Gleason scores 4 or 5
- lpsa: log(prostate specific antigen in ng/mL)
Question 1
Validate that the prostate data frame contains 97 observations. Hint: First install the faraway package (if you haven't already) as instructed on Lesson 1, Slide 49. The following R statement will load the prostate data frame:
data("prostate", package = "faraway").
Use the nrow() function to see how many overvaluations (rows) the data frame has. For example: the following statement prints the number of observations in the car data frame:nrow(cars).
Question 2
Calculate descriptive statistics of each of the variables. Hint: Use the summary() function. For example: summary(cars).
Question 3
Create a new data frame that includes the following variables: lcavol,lweight,age andlpsa. Use this new data frame for all questions below.
Hint: In the following example, we select two variables (agegp and alcgp) from the esoph data frame and name the new data frame esophSubDf
esophSubDf <- esoph[c("agegp", "alcgp")]
Question 4
Calculate descriptive statistics of each of the variables using the new data frame.
Question 5
Create a scatter plot matrix for all the variables using the new data frame.
Hint: Use the pairs() function (see Lesson 2, Slide 50).
Question 6
Create a (Pearson) correlation matrix for all the variables. Hint: Use the cor() function (see Lesson 2, Slide 48).
Question 7
Show the same matrix again, but round the correlations (use two decimal places).
Hint: Use the round() function. The following example calculates the correlation matrix for the cars data frame and rounds the numbers: round(cor(cars),2)
Question 8
Create a regression model: The predictor variable (X) should be lpsa. The outcome variable (Y) should be lcavol. Show the summary of the model.
Hint: Use the lm() and summary() functions (see Lesson 2, Slide 51).
Question 9
Visualize the two variables and the model you just created by doing the following:
Create a scatter plot. Put lcavol in the y-axis and lpsa in the x-axis. Include the regression line and label the axis.
Hint: See Lesson 2, Slide 52.
Question 10
Update the regression model by adding a second predictor: age Show the regression model summary Hint: See Lesson 2, Slide 53.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started