Question
1. Which of the following is NOT true about linear regression? Linear regression is used to predict new values of the target variable. Linear regression
1. Which of the following is NOT true about linear regression?
Linear regression is used to predict new values of the target variable.
Linear regression allows us to predict new values of the independent variable.
Linear regression allows us to model how the target variable changes with the independent variables.
In linear regression, the target variable is a continuous quantity.
2. The ordinary least squares (OLS) algorithm ________________ .
Maximizes the sum of square residuals
Minimizes the sum of square residuals
Minimizes the square of the sum of residuals
Maximizes the square of the sum of residuals
3.Overfitting occurs when _____________.
Our model becomes too specific to the training data
The sum of square residuals is too large
The average of the errors is positive
Our model does not have enough complexity
4. Using multiple linear regression to add in more independent variables ___________.
reduces the overfitting of the data
allows us to add more observational data to the model
allows us to fit a non-linear model to the data
can help explain more variation in the target variable
5. Multicollinearity is the phenomenon where _________________.
the target variable is strongly correlated with an independent variable
the target variable is strongly correlated with the residuals
the independent variables are strongly correlated with the residuals
the independent variables are strongly correlated with other independent variables
6. Which of the following is NOTan assumption of ordinary least squares (OLS):
Linearity
Endogeneity
Random Sampling
Homoscedasticity of Errors
7. Which assumption of OLS assumes that there is no correlation between the error and the independent variables?
Zero Mean Errors
Multicollinearity
Autocorrelation of Errors
Endogeneity
8. A regression analysis between sales (S) (in $1000) and price (P) (in dollars) resulted in the following equation: S = 50,000 - 8P
The above equation implies that an ___________.
increase of $1 in price is associated with a decrease of $8 in sales
increase of $1 in price is associated with a decrease of $42,000 in sales
increase of $1 in price is associated with a decrease of $8000 in sales
increase of $8 in price is associated with an increase of $8,000 in sales
9. Suppose we build a model to predict a store's sales with three independent variables; customers per day, average daily temperature, and number of products available. If we calculate the p-values for these variables as below, which variables are significant and should be kept in the model? Select all that apply.
Variable | p-Value |
Customers per day (I) | 0.0 |
Average daily temperature (II) | 0.54 |
Number of products available (III) | 0.03 |
Variable I
Variable II
Variable III
10. Suppose we have produced a simple linear regression model with the following form: y = 0.65x + 2.9
We then calculate the coefficient of determination as 0.92 and a p-value of 0.1. Which of the following best describes our model?
The model explains a high amount of variance, and the slope is statistically significant
The model explains a low amount of variance, but the slope is statistically significant
The model explains a low amount of variance but the slope is statistically significant
The model explains a high amount of variance but the slope is statistically insignificant
11. Which of the following evaluation metrics is relative to the total error?
Root mean square error
Coefficient of determination
Mean square error
Mean absolute error
12. Which method of regression produces a probability distribution as opposed to a point estimate?
Poisson Regression
Logistic Regression
Bayesian Regression
LASSO Regression
13. You are given a dataset of air pollution readings from several locations in an urban setting. The measurements are taken every hour and include information about traffic flow. To perform regression on this longitudinal data, what kind of regression technique would you use?
Polynomial Regression
Log-Log Regression
Repeated Measures Regression
LASSO Regression
14. You are working with customer data from a large video-on-demand provider, which contains numerical fields with information such as average number of hours watched per month, number of logins per month, time spent browsing per month etc. In this data, there is a flag that indicates whether the customer canceled the service or not (1 for yes, 0 for no). You are looking to build a model from this data to classify what current customers will cancel. What type of model would you use?
Logistic Regression
Random Effects
Bayesian Regression
Poisson Regression
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started