Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For this question, we are going to create data, and then estimate models on this simulated data. This allows us to effectively know the population

For this question, we are going to create data, and then estimate models on this simulated data. This allows us to effectively know the population parameters that we are trying to estimate. Consequently, we can reason about how well our models are doing.

create_homoskedastic_data <- function(n = 100) { d <- data.frame(id = 1:n) %>% mutate( x1 = runif(n=n, min=0, max=10), x2 = rnorm(n=n, mean=10, sd=2), x3 = rnorm(n=n, mean=0, sd=2), y = .5 + 1*x1 + 0*x2 + .25*x32 + rnorm(n=n, mean=0, sd=1) ) return(d) }
d <- create_homoskedastic_data(n=100)

Produce a plot of the distribution of the outcome data. This could be a histogram, a boxplot, a density plot, or whatever you think best communicates the distribution of the data. What do you note about this distribution?

outcome_histogram <- d %>% ggplot() # fill in the rest of this chunk to plot # you will need aes layers (to map data into the plot) # and geom_* layers to draw the plot. You can delete these # comments if you like.

"Fill in here: What do you notice about this distribution?"

Are the assumptions of the large-sample model met so that you can use an OLS regression to produce consistent estimates? "Fill in here: Are the large-sample assumptions satisfied?"

Estimate four models, called model_1, model_2, model_3 and model_4 that have the following form:

Y = 0 + 1x1 + 0x2 + 3x3 + (1) Y = 0 + 1x1 + 2x2 + 3x3 + (2) Y = 0 + 1x1 + 2x2 + 3x23 + (3) Y = 0 + 1x1 + 2x2 + 3x3 + 4x23 + (4)

# If you want to read about specifying statistical models, you can read # here: https://cran.r-project.org/doc/manuals/R-intro.html#Formulae-for-statistical-models' # note, using the I() function is preferred over using poly() model_1 <- 'fill this in' model_2 <- 'fill this in' model_3 <- 'fill this in' model_4 <- 'fill this in'
calculate_msr <- function(model) { # This function takes a model, and uses the `resid` function # together with the definition of the msr to produce # the MEAN of the squared residuals msr <- mean(resid(model)2) return(msr) } model_1_msr <- 'fill this in' model_2_msr <- 'fill this in' model_3_msr <- 'fill this in' model_4_msr <- 'fill this in'

Consider, for a moment, only the first model. Is it possible to select coefficients in this model that would produce a lower mean squared residual? Why or why not?

Which of these models does the best job, in terms of mean squared residuals, at estimating the population coefficients?

Is there any evidence that the additional parameter that you have estimated in model_2 makes make this second model more fully represent the true population? Conduct an F-test with the null hypothesis that model_1 is the correct population model, and evaluate whether you should reject the null to instead conclude that model_2 is more appropriate.

## anova(model_2, model_1, test = 'F')

Explain why the p-values for the tests that you have conducted in parts (a) and (b) are the same. Are these tests merely different ways of asking the same question of a model?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Elementary Algebra

Authors: Tom Carson, Bill E Jordan

4th Edition

0321916042, 9780321916044

More Books

Students also viewed these Mathematics questions

Question

What do you need to know about motivation to solve these problems?

Answered: 1 week ago

Question

4. Similarity (representativeness).

Answered: 1 week ago