Question
54667. Business Analytics and Decision Making Project 2: Predicting Housing Prices Name _________________________________________________ INSTRUCTION: Read each question carefully and answer in the space provided. Please
54667. Business Analytics and Decision Making
Project 2: Predicting Housing Prices
Name _________________________________________________
INSTRUCTION: Read each question carefully and answer in the space provided. Please show all work where necessary. Your solutions (both a word and excel files) must be submitted by 11:59 pm Sunday, November 14. No late assignments are accepted.
Goal
This exercise is intended to offer a simple hands-on experience with Excel for the following ideas/concepts:
Training and building a model based on data,
Validating a model,
Demonstrating the actual rollout of a model, and
The potential effect of adding more variables on in-sample and out-of-sample predictive accuracy.
Note: Excel comes with a linear regression tool that is not activated in default installations. To activate regression in Excel, you will need to:
Go to Excel options -> Choose add ins -> Click on go -> Mark the checkbox for Analysis ToolPak
(Activation may vary in different versions of excel. Check Excel help for regression or for Analysis ToolPak).
[Here is one link that walks you through the steps you need to run a regression in excel
https://www.ablebits.com/office-addins-blog/2018/08/01/linear-regression-analysis-excel/#linear-regression-Excel-Analysis-ToolPak]
The Challenge
The owner of a small real-estate company is interested in identifying potential houses at a "bargain price" from a large volume of apartments that are advertised online and available for sale. Since the real estate company has a limited number of workers, the companys owner hired you to build a simple model that can "predict" the amount of money customers would be willing to pay for these apartments. The model will be used for the initial screening of apartments that appear to be undervalued. Once potentially highly undervalued apartments are identified, three real estate agents will conduct a lengthy and thorough evaluation of the economic value of each apartment.
As an initial pilot study for evaluating the usefulness of this procedure, you are supplied with dataset of 310 apartments sold in the previous year in a coastal resort town. The data are already split into a randomly chosen training set (70% of the records) and a test set (remaining 30% of the records). Due to various data restrictions, the dataset includes the following variables only:
y Apartment price (thousands of $)
X1 Distance from the sea (ft)
X2 House size (sqr ft)
X3 Number of cafs and convenient stores in a 1 mile radius of the apartment
X4 Binary variable indicating a shopping mall in a 3 mile radius of the apartment {1=yes, 0=no}.
Stage 1 Model Estimation
Use Excel to run the following linear regression models. For each model, specify the intercept, the coefficients, and the Mean Squared Errors (MSE)[1] for the training set.
A prediction model to predict housing prices (y) using all the available variables (X1, X2, X3, X4), based on the training set. (We will refer to this model as mdl1)
A prediction model to predict housing prices (y) using all variables (X1, X2), based on the training set. (We will refer to this model as mdl2)
Based on MSE criteria, which of the two models shows better performance over the training set?
Stage 2 Model Evaluation and Selection
Apply the linear regression model mdl1 over the test set. (This can be done in Excel by creating a formula that utilizes the intercept and coefficients of mdl1, and applies it to each record). What is the MSE for mdl1 over the test set?
Similarly, apply the linear regression model, mdl2, over the test set. What is the MSE for mdl2 over the test set?
Based on MSE criteria, which of the two models shows better performance over the test set? If your findings are different from those reported in section 1c, how would you explain these differences?
Which of the two models would you prefer to apply on new data?
Stage 3 Predicting Home Sale Prices
The real estate company downloaded data for approximately 9,100 apartments that are advertised online and are located in the same region. All these records appear in the "rollout" tab in the attached Excel sheet, but do not include the price. Apply the model you selected in section 2 and use it to predict the price of the apartments whose details appear in the rollout" tab. (NB: Include the equation of your preferred model and the predicted price for the apartments for the first 50 data points in your answer).
We use the MSE performance measure to demonstrate this topic in this exercise, since you should be familiar with it from previous statistics courses.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started