Question
Q SCI 381: Introduction to Probability and Statistics Winter 2022 Laboratory #7 (73 points) In today's lab, we will be using the file sleep.csv that
Q SCI 381: Introduction to Probability and Statistics
Winter 2022
Laboratory #7 (73 points)
In today's lab, we will be using the filesleep.csv that is available in Canvas (located in Files\Lab Datasets). The file contains three columns of data for 62 species of mammals:
TotalSleep | LifeSpan | Gestation |
3.3 | 38.6 | 645 |
8.3 | 4.5 | 42 |
12.5 | 14 | 60 |
16.5 | 6 | 25 |
3.9 | 69 | 624 |
9.8 | 27 | 180 |
19.7 | 19 | 35 |
6.2 | 30.4 | 392 |
14.5 | 28 | 63 |
9.7 | 50 | 230 |
12.5 | 7 | 112 |
3.9 | 30 | 281 |
10.3 | 11 | 117 |
3.1 | 40 | 365 |
8.4 | 3.5 | 42 |
8.6 | 50 | 28 |
10.7 | 6 | 42 |
10.7 | 10.4 | 120 |
6.1 | 34 | 202 |
18.1 | 7 | 32 |
6.5 | 28 | 400 |
3.8 | 20 | 148 |
14.4 | 3.9 | 16 |
12 | 39.3 | 252 |
6.2 | 41 | 310 |
13 | 16.2 | 63 |
13.8 | 9 | 28 |
8.2 | 7.6 | 68 |
2.9 | 46 | 336 |
10.8 | 22.4 | 100 |
7.8 | 16.3 | 33 |
9.1 | 2.6 | 21.5 |
19.9 | 24 | 50 |
8 | 100 | 267 |
10.6 | 11 | 30 |
11.2 | 15 | 45 |
13.2 | 3.2 | 19 |
12.8 | 2 | 30 |
19.4 | 5 | 12 |
17.4 | 6.5 | 120 |
5.3 | 23.6 | 440 |
17 | 12 | 140 |
10.9 | 20.2 | 170 |
13.7 | 13 | 17 |
8.4 | 27 | 115 |
8.4 | 18 | 31 |
12.5 | 13.7 | 63 |
13.2 | 4.7 | 21 |
9.8 | 9.8 | 52 |
9.6 | 29 | 164 |
6.6 | 7 | 225 |
5.4 | 6 | 225 |
2.6 | 17 | 150 |
3.8 | 20 | 151 |
11 | 12.7 | 90 |
10.3 | 3.5 | 15 |
13.3 | 4.5 | 60 |
5.4 | 7.5 | 200 |
15.8 | 2.3 | 46 |
10.3 | 24 | 210 |
19.4 | 3 | 14 |
15.3 | 13 | 38 |
https://docs.google.com/spreadsheets/d/1s9rMfCmojB_1AUORLzWgQFnTxjoN9pcCvtPboR3LEp4/edit?usp=sharing
TotalSleep = the number of hours per day spent sleeping
LifeSpan = the maximum life span in years
Gestation = the gestation period in days
Downloadsleep.csv and import the dataset into R/RStudio using theread.csv() function. Store the data in a data frame object namedsleep using:
sleep <- read.csv("sleep.csv")
Recall from lab 6 that you can use the attach() command to attach the data to your R/RStudio workspace.
attach(sleep)
Part 1. Correlation Analysis
In part 1, we will use R/RStudio to conduct a correlation analysis. Before conducting any analyses, let's explore the dataset by plotting pair-wise scatter plots using the following command:
plot(sleep)
(1a) Paste your pair-wise scatterplot below.(2 points)
(1b) Examine the pair-wise scatterplot in (1a). Which pair of variables, if any, would you expect to be negatively correlated? Which pair of variables, if any, would you expect to be positively correlated. Justify your response.(4 points)
(1c) Consider the correlation coefficient, r, between all possible pairs of the variables within the sleep dataset. Write the null and alternative hypotheses for r in a correlation analysis.(2 points)
(1d) Now, conduct a correlation analysis between all possible pairs of the variables within the sleep dataset. Paste your code and output below for each pair of variables.(6 points)
(1e) Using the output from cor.test in (1d), what is the estimate of the correlation coefficient, r, for each pair of variables?(3 points)
(1f) Using alpha = 0.01 and the output from (1d), what is your statistical conclusion and interpretation for each pair of variables?(12 points)
Part 2. Linear Regression Analysis: Using LifeSpan to predict Gestation
(2a) In part 2, we will use R/RStudio to conduct a linear regression to determine if LifeSpan (independent variable) predicts Gestation (dependent variable). Fit a linear regression using lm(). Paste your code and output below.(2 points)
(2b) Using your output from (2a), what is the estimate of the slope of the linear regression? What is your statistical conclusion and interpretation of the slope estimate when using alpha = 0.05?(6 points)
(2c) Interpret the adjusted R-squared value from your output from (2a). What does this value represent?(4 points)
(2d) Use your output from (2a) to write the regression equation.(2 points)
(2e) Use your regression equation from (2d) to predict the Gestation time in mammals that have the following LifeSpan:(6 points)
3 years
29 years
78 years
(2f) Plot the relationship between LifeSpan and Gestation using plot(). Plot LifeSpan on the x-axis and Gestation on the y-axis. Add appropriate axis labels and a main title, and a color of your choice.
After making this plot, you can add a line of best fit based on your linear regression using the abline() function in R/RStudio:
abline(object name)
whereobject name is the object where your linear regression model was stored when using lm() in (2a). Paste your plot with your line of best fit below.(10 points)
(2g) Linear regression assumes that the residuals of the model are approximately normally distributed. To assess the residuals, let's extract the model residuals and store them in an object calledmodel.res using the following command:
model.res <- residuals(object name)
whereobject name is the object where your linear regression model was stored when using lm() in (2a).
Plot a histogram of the residuals. Include a title and a color. Paste your plot below.(6 points)
(2h) What is the mean and median of the model residuals?(2 points)
(2i) Based on your answers from (2h), and a visual assessment of your histogram in (2g), do you think the model residuals are normally distributed? Justify your answer.(6 points)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started