Answered step by step
Verified Expert Solution
Question
1 Approved Answer
SHOW STEP BY STEP HOW TO DO IN XLMINER Boston Housing This dataset contains information collected by the US Census Bureau concerning housing prices in
SHOW STEP BY STEP HOW TO DO IN XLMINER
Boston Housing
This dataset contains information collected by the US Census Bureau concerning housing prices in
the area of Boston, Massachusetts. The dataset has cases.
For the classification exercise, we have created a new variable CATMEDV that assigns a class
based on the median value of the tract. The class definition is as follows:
less than or equal to $Low value
less than or equal to $ but more than $Medium value
More than $High value
The following steps are recommended:
Start XLMiner
Open the Boston Housing file.
Partition the data you can use the defaults
Run classification tree, using the Prune tree with validation data option. Use the
"normalize data" option, and set both the "Min. # of records in a terminal node" and the
"Max. # of levels displayed in tree drawing" to Use the output options specified below.
The key ingredient in a tree algorithm is the splitting of the data according to the value of a
variable, and for the XLMiner tree algorithm the split for normalized data is the same as
that for nonnormalized data.
a Full Tree
b Best Pruned Tree
Study the results and answer the following questions:
Questions:
What is the error rate for the validation data, using the best pruned tree?
Study the best pruned tree and develop a classification rule, using ifthen statements.
What is the difference in terms of number of nodes between the best pruned tree and full
tree?
If you were to apply these trees to new data from the same source as these data, how
would you expect the error rate for the full tree to compare to the error rate for the best
pruned tree?
Supposing the local government decided to levy tax based on the predicted class of
houses and the tax rate is fixed as:
Class $
Class $
Class $
Estimate the overall net gain or loss the government will make due to misclassification
based on the misclassification of the validation data. Use the Assessment sheet in
BostonHousingxls file to explain your computation.
Charles Book Club
Identify the customers to whom you will send the mailers, using as a cutoff criterion a
probability of a purchase, as predicted by a logistic regression model. In other words, you would
propose mailing to any customer who has a predicted probability of purchase as predicted
by your logistic regression models Build models by selecting variables as follows partitioning
to training, to validation, and to test; we will not be using the test partition right
now:
a The full set of predictors as independent variables and "Florence" as the dependent
variable,
b A best subset selection of six variables note: the best subset selection process in
XLMiner is similar to that for multiple linear regression for subset selection
c Only the R F M variables.
Note: Once you have generated XLMiner output LRClassifyValid sheet, you can update the
classification of records by modifying the cutoff criterion field. Verify this for illustrative
purposes by sorting the data in descending order of values in the "Predicted Prob. of
Success" field in order to locate the cutoff value of or You will need this sort to
answer question below.
Indicate the trial that produced better results and explain why.
Tabulate the results of the three experiments conducted above and report which experiment
produced the best result. Tabulate your conclusion in the Assessment table below.
Assessment Table
Srl Evaluation Technique Target
Customer
Nos.
Actual
Response
Nos.
Response
Rate
If we sent mailers to all the customers in
validation data set.
Logistics Regression: cutoff rate of
probability of success.
a The full set of predictors as
dependent variables and "Florence"
as the independent variable,
b Best subset selection of six
variables, including constant
c Only the R F M variables.
If you wish, you can use the "best subset" option in Logistic Regression. When you implement it
should set "size of best subset" to and # of best subsets" to this will produce one best
subset for each size of and smaller ie one best subset with variables, one with one with
six, etc. After you run logistic regression with best subset selected, the summary output page will
give you information you can use in choosing which variables to use. You then need to run logistic
regression again, using those variables.
For the best model, indicate the lift ratio for first cases.
Note: The lift ratio is calculated as follows:
Sort by predicted probability of success
For the first rows, determine response rate actuals divided by
Divide this response rate by the response rate obtained by mailing to everyone # above
This is the "lift" obtained by mailing to cases.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started