Question
Junction is a small town with two suburbs. The data file Major Project - Data Set contains data on 540 houses sold in Junction between
Junction is a small town with two suburbs. The data file "Major Project - Data Set" contains data on 540 houses sold in Junction between 2017 and 2022. This data includes the price at which the house was sold, which of two agents sold the house (all houses are sold through an agent by law), the year in which the house was sold as well as data on various characteristics of each house sold (age, size, number of stories etc.). These characteristics serve as possible explanatory variables of sale price. Data definitions follow:
OBS = observation
AGE = age of house in years
SHOPS = 1 if house is close to a shopping precinct, 0 otherwise
CRIME = crime rate of the suburb within which the house is located
TOWN = distance in kilometres to the town centre
STORIES = number of dwelling stories
OCEAN = 1 if house has an ocean view, 0 otherwise
POOL = 1 if house has a pool, 0 otherwise
PRICE = price at which the house was sold (in dollars)
AGENT = selling agent - "W&M" (0) or "A&B" (1)
SIZE = size of the house in square metres
SUBURB = Mayfair (0) or Claygate (1)
TENNIS = 1 if house has a tennis court, 0 otherwise
SOLD = year of last sale (2016 to 2021)
Task 1
You are required to provide a comprehensive summary of the data set contained in the "Major Project - Data Set" file. How you choose is entirely at your discretion. However, it is recommended that you consider using both summary statistic and graphical methods while also noting any peculiarities within the data set.
Task 1 directed you to take note of any peculiarities in the data set. There are other additional errors in the data set that you may not have picked up on in Task 1. These will only become clear to you once you start working on Task 2. Several problems can result if you fail to handle these issues correctly, so be mindful to address them, both in your regression application as well as your final report. If resolving any of the errors in the dataset requires you to make assumptions, make sure to clearly state your reasoning and approach in your report.
Note: We have noted that there is an error in our data set for the SHOPS variable. This variable should be a dummy variable, which has a value of 0 or 1, indicates that whether the house is close to a shopping precinct. However, the shop variable has values of 0,1,2 and 3 in the data set. Thus, we have removed the values of 2 and 3 by recoding back to the value of 0 or 1.
HOW WILL WE RECODE THE DATA BACK TO THE VALUE OF 0 OR 1?
Excel data for the project
https://1drv.ms/x/s!AkCrFp_7u6LZizxkZpskccGjI5UA
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started