Question
Description: Zillow's Zestimate home valuation has shaken up the U.S. real estate industry since first released 11 years ago. A home is often the largest
Description:
Zillow's Zestimate home valuation has shaken up the U.S. real estate industry since first released 11 years ago. A home is often the largest and most expensive purchase a person makes in his or her lifetime. Ensuring homeowners have a trusted way to monitor this asset is incredibly important. The Zestimate was created to give consumers as much information as possible about homes and the housing market, marking the first-time consumers had access to this type of home value information at no cost. "Zestimates" are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning. This project is the very simplified version of Zillow Prize competition. Zillow Prize was a competition with a one-million-dollar grand prize with the objective to help push the accuracy of the Zestimate even further. Winning algorithms stand to impact the home values of 110M homes across the U.S.
Project You test and compare the following three models. A) Build a regression [module 5] and decision tree [module 7] model that can accurately predict the price of a house based on several predictors (you select appropriate features). B) Use classification [module 6] to model OverallQual (rating 7 and above is considered as class 1, otherwise class zero). Project Deliverables
The following items need to be delivered:
I. Project report:
This is your end-of-project delivery document. It is a document that summarizes different aspects of the project. It includes the following sections:
0. The first page of the document should include a table with a list of the names of the group participants and a summary of the contribution of each team member. 1. Project Goal 2. Overview of data, including data exploration analysis 3. Details of your modeling strategy (feature selection, how/why you choose specific features, optimal hyperparameters, etc.) 4. Estimation of the models performance. It has two subsections: -establishing models (based on the train dataset) and -prediction performance based on the test dataset. The reason to build any model is to be able to use it! In your project, once you have constructed your model, you need to use your model to predict. You must include a subsection to compare your prediction results. To compare models, you may use criteria based on the confusion matrix and/or criteria such as R2_adj, RMSE, and MAE. 5. Insights and conclusions You can include snapshots of your R code and the outputs in the report. You must submit a single document per group in Word format.
II. R codes and script: Submit the commented R-Markdown file (*.html).
III. Presentation Slides:
Finally, you need to prepare a few presentation slides sharing your insights from your project with the board of directors. This should mostly focus on the high-level insights as opposed to technical details. As a part of your submission, you should include a narrated PowerPoint slide. You can easily do this by including your voice recordings in the presentations.
Submit these three items, namely, I-III, as a single zipped file. One Submission per group. The zip file should be named Group_X.zip where X is your group number, e.g., Group_10.zip
Appendix A:
Description of variables (DATA dictionary)
LotArea: Lot size in square feet
OverallQual: Rates the overall material and finish of the house. 10 Very Excellent; 9 Excellent; 8 Very Good; 7 Good; 6 Above Average; 5 Average; 4 Below Average; 3 Fair; 2 Poor; and 1 Very Poor.
YearBuilt: Original construction date
YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)
BsmtFinSF1: Finished square feet
FullBath: Full bathrooms
HalfBath: Half baths
BedroomAbvGr: Number of Bedrooms above the ground
TotRmsAbvGrd: Number of rooms above the ground
Fireplaces: Number of fireplaces
GarageArea: Size of garage in square feet
YrSold: Year sold
SalePrice: The sale price of the property.
Below is the Data sets:
House_Prices
LotArea | OverallQual | YearBuilt | YearRemodAdd | BsmtFinSF1 | FullBath | HalfBath | BedroomAbvGr | TotRmsAbvGrd | Fireplaces | GarageArea | YrSold | SalePrice |
8450 | 7 | 2003 | 2003 | 706 | 2 | 1 | 3 | 8 | 0 | 548 | 2008 | 208500 |
9600 | 6 | 1976 | 1976 | 978 | 2 | 0 | 3 | 6 | 1 | 460 | 2007 | 181500 |
11250 | 7 | 2001 | 2002 | 486 | 2 | 1 | 3 | 6 | 1 | 608 | 2008 | 223500 |
9550 | 7 | 1915 | 1970 | 216 | 1 | 0 | 3 | 7 | 1 | 642 | 2006 | 140000 |
14260 | 8 | 2000 | 2000 | 655 | 2 | 1 | 4 | 9 | 1 | 836 | 2008 | 250000 |
14115 | 5 | 1993 | 1995 | 732 | 1 | 1 | 1 | 5 | 0 | 480 | 2009 | 143000 |
10084 | 8 | 2004 | 2005 | 1369 | 2 | 0 | 3 | 7 | 1 | 636 | 2007 | 307000 |
10382 | 7 | 1973 | 1973 | 859 | 2 | 1 | 3 | 7 | 2 | 484 | 2009 | 200000 |
6120 | 7 | 1931 | 1950 | 0 | 2 | 0 | 2 | 8 | 2 | 468 | 2008 | 129900 |
7420 | 5 | 1939 | 1950 | 851 | 1 | 0 | 2 | 5 | 2 | 205 | 2008 | 118000 |
11200 | 5 | 1965 | 1965 | 906 | 1 | 0 | 3 | 5 | 0 | 384 | 2008 | 129500 |
11924 | 9 | 2005 | 2006 | 998 | 3 | 0 | 4 | 11 | 2 | 736 | 2006 | 345000 |
12968 | 5 | 1962 | 1962 | 737 | 1 | 0 | 2 | 4 | 0 | 352 | 2008 | 144000 |
10652 | 7 | 2006 | 2007 | 0 | 2 | 0 | 3 | 7 | 1 | 840 | 2007 | 279500 |
10920 | 6 | 1960 | 1960 | 733 | 1 | 1 | 2 | 5 | 1 | 352 | 2008 | 157000 |
6120 | 7 | 1929 | 2001 | 0 | 1 | 0 | 2 | 5 | 0 | 576 | 2007 | 132000 |
11241 | 6 | 1970 | 1970 | 578 | 1 | 0 | 2 | 5 | 1 | 480 | 2010 | 149000 |
10791 | 4 | 1967 | 1967 | 0 | 2 | 0 | 2 | 6 | 0 | 516 | 2006 | 90000 |
13695 | 5 | 2004 | 2004 | 646 | 1 | 1 | 3 | 6 | 0 | 576 | 2008 | 159000 |
7560 | 5 | 1958 | 1965 | 504 | 1 | 0 | 3 | 6 | 0 | 294 | 2009 | 139000 |
14215 | 8 | 2005 | 2006 | 0 | 3 | 1 | 4 | 9 | 1 | 853 | 2006 | 325300 |
7449 | 7 | 1930 | 1950 | 0 | 1 | 0 | 3 | 6 | 1 | 280 | 2007 | 139400 |
9742 | 8 | 2002 | 2002 | 0 | 2 | 0 | 3 | 7 | 1 | 534 | 2008 | 230000 |
4224 | 5 | 1976 | 1976 | 840 | 1 | 0 | 3 | 6 | 1 | 572 | 2007 | 129900 |
8246 | 5 | 1968 | 2001 | 188 | 1 | 0 | 3 | 6 | 1 | 270 | 2010 | 154000 |
14230 | 8 | 2007 | 2007 | 0 | 2 | 0 | 3 | 7 | 1 | 890 | 2009 | 256300 |
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started