Question
I was given this question as in class exercise using STATA: The following questions concern the study by Gross et al. (1999) about the relationship
I was given this question as in class exercise using STATA: The following questions concern the study by Gross et al. (1999) about the relationship between funding by the National Institutes of Health and the burden of 29 diseases. The data are given in a Stata data file called 3.ex.Funding.dta.The variable names and definitions in this file are: disease = condition or disease, id = a numeric disease identification number, dollars = thousands of dollars of NIH research funds per year, incid = disease incidence rate per 1000, preval = disease prevalence rate per 1000, hospdays = thousands of hospital-days, mort = disease mortality rate per 1000, yrslost = thousands of life-years lost, disabil = thousands of disability-adjusted life-years lost.
my qyaestion is how to do the residual analysis in these questions as well as how to identify the disease with the large influence in Q3
- Regress log[dollars] against log[hospdays], log[mort], log[yrslost], and log[disabil]. Calculate the expected log[dollars] and studentized residuals for this regression. What bounds should contain 95% of the studentized residuals under this model? Draw a scatter plot of these residuals against expected log[dollars]. Draw horizontal lines at zero and the 95% bounds for the studentized residuals. What does this graph tell you about the quality of the fit of this model to these data?
- In the model from Question 1, calculate the delta beta influence statistic for log[mort]. List the values of this statistic together with the disease name, studentized residual, and leverage for all diseases for which the absolute value of this delta beta statistic is greater than 0.5. Which disease has the largest influence on the log[mort] parameter estimate?
- Draw scatter plots of log[dollars] against the other covariates in the model from Question 1. Identify the disease in these plots that had the most influence on log[mort] in Question 2. Does it appear to be particularly influential in any of these scatter plots? 8. Regress log[dollars] against log[disabil] and log[hospdays]. What is the estimated expected amount of research funds budgeted for a disease that causes a million hospital-days a year and the loss of a million disability adjusted life-years? Calculate a 95% confidence interval for this expected value. Calculate a 95% prediction interval for the funding that would be provided for a new disease that causes a million hospital-days a year and the loss of a million disability-adjusted life-years.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started