Question
Problem 3: Exploring statistical significance This problem will guide you through the process of conducting a simulation to investigate statistical significance in regression. In this
Problem 3: Exploring statistical significance
This problem will guide you through the process of conducting a simulation to investigate statistical significance in regression.
In this setup, we'll repeatedly simulatenobservations of 3 variables: an outcomey, a covariatex1that's associated withy, and a covariatex2that is unassociated withy. Our model is:
y
i
=0.5x
1
+
i
where the
i
are independent normal noise variables having standard deviation 5 (i.e., Normal random variables with mean 0 and standard deviation 5).
Here's the setup.
set.seed(12345) # Set random number generator n <- 200 # Number of observations x1 <- runif(n, min = 0, max = 10) # Random covariate x2 <- rnorm(n, 0, 10) # Another random covariate
To generate a random realization of the outcomey, use the following command.
# Random realization of y y <- 0.5 * x1 + rnorm(n, mean = 0, sd = 5)
Here's are plots of that random realization of the outcomey, plotted againstx1andx2.
qplot(x = x1, y = y)
qplot(x = x2, y = y)
(a)Write code that implements the following simulation (you'll want to use a for loop):
for 2000 simulations: generate a random realization of y fit regression of y on x1 and x2 record the coefficient estimates, standard errors and p-values for x1 and x2
At the end you should have 2000 instances of estimated slopes, standard errors, and corresponding p-values for bothx1andx2. It's most convenient to store these in a data frame.
# Note the cache = TRUE header here. This tells R Markdown to store the output of this code chunk and only re-run the code when code in this chunk changes. By caching you won't wind up re-running this code every time you knit. # Edit me
(b)This problem has multiple parts.
- Construct a histogram of the coefficient estimates forx1.
- Calculate the average of the coefficient estimates forx1. Is the average close to the true value?
- Calculate the average of the standard errors for the coefficient ofx1. Calculate the standard deviation of the coefficient estimates forx1. Are these numbers similar?
# Edit me
Take-away from this problem: theStd. Errorvalue in the linear model summary is an estimate of the standard deviation of the coefficient estimates.
(c)Repeat part (b) forx2.
# Edit me
(d)Construct a histogram of the p-values for the coefficient ofx1. What do you see? What % of the time is the p-value significant at the 0.05 level?
# Edit me
(e)Repeat part (d) withx2. What % of the time is the p-value significant at the 0.05 level?
# Edit me
(f)Given a coefficient estimate
and a standard error estimatese
^
(
)
, we can construct an approximate 95% confidence interval using the "2 standard error rule". i.e.,
[
2se
^
,
+2se
^
]
is an approximate 95% confidence interval for the true unknown coefficient.As part of your simulation you stored
andse
^
values for 2000 simulation instances. Use these estimates to construct approximate confidence intervals and answer the following questions.
- Question: In your simulation, what % of such confidence intervals constructed for the coefficnet ofx1actually contain the the true value of the coefficient (
- 1
- =0.5
- ).
Replace this text with your answer. (do not delete the html tags)
- Question: In your simulation, what % of such confidence intervals constructed for the coefficient ofx2actually contain the the true value of the coefficient (
- 2
- =0
- ).
Replace this text with your answer. (do not delete the html tags)
https://www.andrew.cmu.edu/user/achoulde/94842/homework/homework5.html
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started