Question

1 Approved Answer

Posted on Oct 09, 2024

The divorce rate in the United States during the years 1920-1996 can be modeled with the quantitative variables listed below. In Problem 1 you will

The divorce rate in the United States during the years 1920-1996 can be modeled with the quantitative variables listed below.

In Problem 1 you will examine an assumed linear model of the divorce rate as a function of socio-economic characteristics (as predictors).

We will assume i.i.d. normal errors for the response values, and unknown constant variance of the errors.In the following questions, use the data as-is; do not remove any outliers.

Dataset can be found here: https://drive.google.com/file/d/1CZdb2m_eeY60sw82yGn3MaFNoK7xx5kY/view?usp=sharing

simply copy the link and paste it on your browser.

Read the data from `divusa.txt` into R. Use

divusa <- read.table("divusa.txt", header = T,sep=',')

The data description is as follows:

- `divorce`: divorce per 1000 women aged 15 or more

- `unemployed`: unemployment rate

- `femlab`: percent female participation in labor force aged 16+

- `marriage`: marriages per 1000 unmarried women aged 16+

- `birth`: births per 1000 women aged 15-44

- `military`: military personnel per 1000 population

## Part (a)

demonstrate a numerical summary of the data, and use the function `pairs()` (in base R) to demonstrate a graphical summary of the data. Do you see anything that looks promising for modeling? Do you see anything that may alert you to potential problems? Limit your answer to one or two sentences.

## Part (b)

Fit a linear model to predict the variable `divorce` from the variable `femlab`.

## Part (c)

What *specific* hypothesis is being tested with the p-value given for the slope coefficient in the output in part (b)? (State the null and alternative hypotheses).Do you accept or reject the null-hypothesis, and on what basis?

## Part (d)

What is the sample size?

## Part (e)

Does the intercept term have a useful interpretation, in terms of the model? Explain in one or two sentences.

## Part (f)

What percentage of variation in the data is not explained by the model?

## Part (g)

Plot the standardized residuals against the response variable and the predictor variable, and produce a Q-Q plot of the standardized residuals. What can we conclude about the normality of the errors, the constancy of the error variance, and the relationship between the errors and the variable?

## Part (h)

What is the estimated mean divorce rate when femlab = 38?

## Part (i)

demonstrate a 97% prediction interval around the mean response estimated in part (i).

## Part (j)

demonstrate a 97% confidence interval for$\beta_1$, the slope coefficient.

## Part (k)

Suppose that the percent of female participation in the labor force increased by 13 from one year to the next.What would be the predicted change in the US divorce rate?