Specifically, we examine the 2004 Survey of Consumer Finances (SCF), a nationally representative sample that contains extensive

Question:

Specifically, we examine the 2004 Survey of Consumer Finances (SCF), a nationally representative sample that contains extensive information on assets, liabilities, income, and demographic characteristics of those sampled (potential U.S. customers). We study a random sample of 500 families with positive incomes. From the sample of 500, we initially consider a subsample of \(n=275\) families that purchased term life insurance.
Consider a linear regression of LNINCOME, EDUCATION, NUMHH, MARSTAT, AGE, and GENDER on LNFACE.

a. Collinearity. Not all of the variables turned out to be statistically significant.
To investigate one possible explanation, calculate variance inflation factors.
a(i). Briefly explain the idea of collinearity and a variance inflation factor.
a(ii). What constitutes a large variance inflation factor?
a(iii). If a large variance inflation factor is detected, what possible courses of action do we have to address this aspect of the data?
a(iv). Supplement the variance inflation factor statistics with a table of correlations of explanatory variables. Given these statistics, is collinearity an issue with this fitted model? Why or why not?

b. Unusual Points. Sometimes a poor model fit can be due to unusual points.
b(i). Define the idea of leverage for an observation.
b(ii). For this fitted model, give standard rules of thumbs for identifying points with unusual leverage. Identify any unusual points from the attached summary statistics.
b(iii). An analyst is concerned with leverage values for this fitted model and suggests using FACE as the dependent variable instead of LNFACE. Describe how leverage values would change using this alternative dependent variable.

c. Residual Analysis. We can learn how to improve model fits from analyses of residuals.
c(i). Provide a plot of residuals versus fitted values. What do we hope to learn from this type of plot? Does this plot display any model inadequacies?
c(ii). Provide a qq plot of residuals. What do we hope to learn from this type of plot? Does this plot display any model inadequacies?
c(iii). Provide a plot of residuals versus leverages. What do we hope to learn from this type of plot? Does this plot display any model inadequacies?

d. Stepwise Regression. Run a stepwise regression algorithm. Suppose that this algorithm suggests a model using LNINCOME, EDUCATION, NUMHH, and GENDER as explanatory variables to predict the dependent variable LNFACE.
d(i). What is the purpose of stepwise regression?
d(ii). Describe two important drawbacks of stepwise regression algorithms.

Step by Step Answer:

Question Posted: