Question
Hello, I need some help with this homework that needs to be resolved using R: Problem 1 Refer to the Crime rate data. A criminologist
Hello,
I need some help with this homework that needs to be resolved using R:
Problem 1
Refer to the Crime rate data. A criminologist studying the relationship between level of education-and crime rate in medium-sized U.S. counties collected the following data for a random sample of 84 counties; X is the percentage of individuals in the county having at least a high-school diploma, and Y is the crime rate ( crimes reported per 100,000 residents) last year.
a-) Set up the ANOVA table for the regression model.
b-) Carry out the test in part a by means of the F test. Show the numerical equivalence of the two test statistics and decision rules. Is the P-value for the F test the same as that for the t test?
c-) By how much is the total variation in crime rate reduced when percentage of high school graduates is introduced into the analysis? Is this a relatively large or small reduction?
d-) State the full and reduced models.
e-) Obtain (1) SSE(F), (2) SSE(R), (3) dfF. (4) dfR, (5) test statistic F* for the general linear test, (6) decision rule.
f-)Are the test statistic F* and the decision rule for the general linear test numerically equivalent to those in part a?
Problem 2
Five observations on Y are to be taken when X = 4, 8, 12, 16, and 20, respectively. The true regression function is E(Y} = 20 + 4X, and the Ei are independent N(O, 25).
a-) Generate five normal random numbers, with mean O and variance 25. Consider these random numbers as the error terms for the five Y observations at X = 4, 8, 12, 16, and 20 and calculate Y1 , Y2 , Y3 , Y4 , and Y5 . Obtain the least squares estimates b0 and b1, when fitting a straight line to the five cases. Also calculate Yh when Xh = 10 and obtain a 95 percent confidence interval for E(Yh) when Xh = 10.
b-) Repeat part (a) 200 times, generating new random numbers each time.
c-) Make a frequency distribution of the 200 estimates b1. Calculate the mean and standard deviation of the 200 estimates b1. Are the results consistent with theoretical expectations?
d-) What proportion of the 200 confidence intervals for E(Yh) when Xh=10 include E(Yh)? Is this result consistent with theoretical expectations?
Problem 3
Refer to attached file for the Question 3 Data set
a- ) Create train and test data sets: Obtain a random sample of 400 cases from the 522 cases for the train data set and remaining 122 cases for the test data set. (use set.seed(1023) before selecting the sample)
b-) Build a regression model to predict Y as a function of X on the train data set. Write down the regression model, Is the regression model significant?
c-) Set up the ANOVA table for the regression model.
d-) State the full and reduced models. Perform the general linear F test
e-) Plot residuals against X and Y and comment on unequal variances. Calculate semistudentized residua and graph them. Are there any outliers?
f-) Test the model performances on the test data set.
g-) plot residuals against X and Y obtained in part f-), and then comment on the outliers, and unequal variances
h-) Provide your recommendation regarding this model, is this model robust?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started