Question
Final Exam ANA 500 Use gretl and the State Smoking dataset to answer the questions below. Upload a Word or PDF document that contains your
Final Exam
ANA 500
Use gretl and the State Smoking dataset to answer the questions below.
Upload a Word or PDF document that contains your answers.
Upload your gretl script file in addition to your Word or PDF document.
1. Dataset:
a) How many variables are in this dataset?
b) How many observations are in this dataset?
c) What are the elements/entities in this dataset?
2. Descriptive statistics:
a) Calculate descriptive statistics for the variables "consumption" and "cig_price".
b) What types of variables are "consumption" and "cig_price"?
c) Provide a scatterplot between cigarette consumption and cigarette prices (do not include a fit line).
Describe the relationship between the two variables shown in the scatterplot.
d) Provide an estimated density plot for the variable "consumption". Is the variable skewed? If yes,
in which direction?
e) Calculate separate descriptive statistics for each region for the variables "consumption" and
"cig_price".
f) What type of variable is "region"?
This study source was downloaded by 100000774948581 from CourseHero.com on 10-13-2021 00:18:59 GMT -05:00
https://www.coursehero.com/file/100810675/Final-Exam-ANA-500-Summer-2021-Questionspdf/
This study resource was
shared via CourseHero.com
g) In which region is the average price of a pack of cigarettes the highest? In which region is per
capita cigarette consumption the highest?
3. Simple linear regression:
Estimate a first-order simple linear regression model where "consumption" is the outcome of interest and
"cig_price" is the predictor.
a) Write the estimated regression equation.
b) Interpret the coefficient on the "cig_price" variable.
c) Is the estimated coefficient on the "cig_price" variable statistically significant? How do you
know?
d) Interpret the model's R-squared.
e) Provide a scatterplot between "consumption" and "cig_price" that includes the estimated
regression equation.
f) Use your estimated model to predict per capita cigarette consumption in a state where the average
price of a pack of cigarettes is $4.
g) Explain whether it is meaningful to interpret the estimated intercept in your model.
4. Multiple linear regression:
a) Now include "med_income" as a predictor in the model that you estimated for question 3.Interpret the estimated coefficients.
b) Are the estimated coefficients statistically significant? How do you know?
c) Compare the adjusted R-squared from this model and the model you estimated in question 3. What
can you infer from comparing the adjusted R-squared between the two models?
d) Interpret the results from the F-test for this model.
e) Use your estimated model to predict per capita cigarette consumption in a state where the average
price of a pack of cigarettes is $4 and the median household income is $42,000.
f) Now estimate a model that allows the effect of cigarette prices to depend on the state's median
household income. Interpret the model's estimated coefficients.
g) Include region dummy variables as predictors in the model that you estimated in question 3.
Interpret the estimated coefficients and conduct an F-test to determine if the estimated coefficients
on the region dummies are jointly significant.
5. Non-linear functional forms:
a) Estimate a quadratic regression model where the outcome of interest is cigarette consumption and
the predictor variable is the price of a pack of cigarettes.
b) Estimate the effect of an increase in the price of a pack of cigarettes from $4 to $5.
c) Provide a scatterplot that includes the estimated regression equation for the quadratic model.
d) Estimate a linear-log regression model where the outcome of interest is cigarette consumption and
the predictor is the price of a pack of cigarettes.
e) Estimate the effect of an increase in the price of a pack of cigarettes from $4 to $5.
f) Provide a scatterplot that includes the estimated regression equation for the linear-log model.
g) Which model provides a better fit to the data? The quadratic or linear-log model? Which model do
you think is more appropriate from a theoretical perspective?
6. Binary dependent variable:
a) Estimate a linear probability model where the outcome of interest is whether a state has an above
average smoking rate and the predictor is cigarette taxes. Interpret the estimated slope coefficient.
b) Provide a scatterplot the includes the estimated regression equation.
c) In general, what is the main problem associated with the linear probability model? Is that problem
encountered for the model you estimated?
d) Estimate a logit model using the same variables you used for the linear probability model.
Calculate and interpret the change in the odds of a state having an above average smoking rate
from an increase in cigarettes taxes from $1 to $2.
e) Interpret the marginal effect at the mean value of cigarette taxes.
f) Interpret the results from the classification table provided in the output from the logit regression.
Does the model seem to do a good job predicting when a state has an above average smoking rate?
g) Why is the logit model typically preferred over the linear probability model?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started