Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Work individually on this assignment. You are encouraged to collaborate on ideas and strategies pertinent to this assignment. Data for this assignment is focused on

  1. Work individually on this assignment. You are encouraged to collaborate on ideas and strategies pertinent to this assignment. Data for this assignment is focused on real estate transactions recorded from 1964 to 2016 and can be found in Housing.xlsx. Using your skills in statistical correlation, multiple regression, and R programming, you are interested in the following variables: Sale Price and several other possible predictors.
    1. If you worked with the Housing dataset in previous week - you are in luck, you likely have already found any issues in the dataset and made the necessary transformations. If not, you will want to take some time looking at the data with all your new skills and identifying if you have any clean up that needs to happen.
  2. Complete the following:
    1. Explain any transformations or modifications you made to the dataset.
    2. Create a linear regression model where "sq_ft_lot" predicts Sale Price.
    3. Get a summary of your first model and explain your results (i.e., R2, adj. R2, etc.)
    4. Get the residuals of your model (you can use 'resid' or 'residuals' functions) and plot them. What the does the plot tell you about your predictions?
    5. Use a qq plot to observe your residuals. Do your residuals meet the normality assumption?
    6. Now, create a linear regression model that uses multiple predictor variables to predict Sale Price (feel free to derive new predictors from existing ones). Explain why you think each of these variables may add explanatory value to the model.
    7. Get a summary of your next model and explain your results.
    8. Get the residuals of your second model (you can use 'resid' or 'residuals' functions) and plot them. What the does the plot tell you about your predictions?
    9. Use a qq plot to observe your residuals. Do your residuals meet the normality assumption?
    10. Compare the results (i.e., R2, adj R2, etc) between your first and second model. Does your new model show an improvement over the first? To confirm a 'significant' improvement between the second and first model, use ANOVA to compare them. What are the results?
    11. After observing both models (specifically, residual normality), provide your thoughts concerning whether the model is biased or not.
    12. Another important aspect of regression tasks is determining the accuracy of your predictions. For this section, we will look at root mean square error (RMSE), a common accuracy metric for regression models.
      1. Install the 'Metrics' package in R Studio
      2. Using the first model, we will make predictions on the dataset using the predict function. An example would look like this (will vary for you based on variable names):
        1. 'preds <- predict(object = modelName, newdata = dataset)'
        2. Use the 'rmse' function to get RMSE for the model ('rmse(actual, predicted)')
      3. What is the RMSE for the first model?
      4. Perform the same task for the second model. Provide the RMSE for the second model.
      5. Did the second model's RMSE improve upon the first model? By how much?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

A First Course in General Relativity

Authors: Bernard Schutz

2nd edition

521887054, 978-0521887052

More Books

Students also viewed these Mathematics questions