Answered step by step
Verified Expert Solution
Question
1 Approved Answer
The management of real estate company is considering you as an external consulting group to outsource the task to develop a reliable predictive model to
The management of real estate company is considering you as an external consulting group to outsource the task to develop a reliable predictive model to predict the selling price of the properties, using the aforementioned historical dataset. You are required to build different predictive models, compare and contrast which is the best model for the selected dataset. You are also provided with a data set with new properties about to be listed, for which you have to predict the house prices (scoring dataset). Q1. Setting up the project and exploratory analysis (10%) Needs to provide a screen shot as evidence for each subsection of Q1 a. Create a new project and create a data source based on the given datasets. Set Price as the role of Target and make sure the Role and Level assigned to each variable is correct. b. Carry out a data exploration by using a StatExplore Node. Explain your findings with regard to your property dataset. c. Create a Data Partition with 70% of the data for training and 30% for validation. Q2. Decision tree-based modeling and analysis (25%) Carry out the following modeling tasks for the selected property value dataset. a. Create two Decision Tree models based on two-way and three-way splits to create the two separate decision tree models. Provide the relevant diagrams of the Decision trees. For each decision tree, I.How many leaves are in the optimal tree? II.Which variable was used for the first split? III.What were the competing splits for this first split? b. Which of the decision tree models appears to be better? Justify your answer. c. Refer to the selected decision tree model in part (b) and I. Identify two leaf nodes which have good predictive performances and two leaf nodes with poor predictive performances. II. Provide justifications for your selections. III. Write down the rules for the pathways leading up to each selected leaf node. Q3. Regression-based modeling and analysis (25%) a.In preparation for regression, is any missing values imputation needed? If yes, should you do this imputation before generating the decision tree models? Why or why not? b.Use an Impute node connected to Data Partition node to handle missing values. Which variables have been imputed? c.Are there any ordinal variables? Use the Replacement node to assign relevant values. d.Conduct data exploration to select the best variables for the model with Variable Clustering node. Describe and justify how you ascertained the best variables to the model. e.Create a Regression model using the set of variables you identified as suitable in part (d). You can choose the stepwise selection and use validation error as the selection criterion. f.Run the Regression node and view the results. I.Which variables are included in the final model? Explain what this means to the real estate company (very briefly). II.What is the validation of Average Square Error (ASE) (or Mean Square error (MSE))? What does this mean in a predictive model?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started