[Solved] SHOW STEP BY STEP HOW TO DO IN XLMINER Bo

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 09, 2024

SHOW STEP BY STEP HOW TO DO IN XLMINER Boston Housing This dataset contains information collected by the US Census Bureau concerning housing prices in

SHOW STEP BY STEP HOW TO DO IN XLMINER

Boston Housing

This dataset contains information collected by the US Census Bureau concerning housing prices in

the area of Boston, Massachusetts. The dataset has

506

cases.

For the classification exercise, we have created a new variable

CAT

.

MEDV

that assigns a class

based on the median value of the tract. The class definition is as follows:

1 =

less than or equal to $

15, 000 (

Low value

)

2 =

less than or equal to $

30, 000

but more than $

15, 000 (

Medium value

)

3 =

More than $

30, 000 (

High value

)

The following steps are recommended:

1 .

Start XLMiner

2 .

Open the Boston Housing file.

3 .

Partition the data

(

you can use the defaults

)

4 .

Run classification tree, using the

Prune tree with validation data option.

Use the

"normalize data" option, and set both the "Min. # of records in a terminal node" and the

"Max. # of levels displayed in tree drawing" to

5 .

Use the output options specified below.

The key ingredient in a tree algorithm is the splitting of the data according to the value of a

variable, and for the XLMiner tree algorithm the split for normalized data is the same as

that for non

-

normalized data.

.

Full Tree

.

Best Pruned Tree

Study the results and answer the following questions:

Questions:

1 .

What is the error rate for the validation data, using the best pruned tree?

2 .

Study the best pruned tree and develop a classification rule, using if

-

then statements.

3 .

What is the difference in terms of number of nodes between the best pruned tree and full

tree?

4 .

If you were to apply these trees to new data from the same source as these data, how

would you expect the error rate for the full tree to compare to the error rate for the best

pruned tree?

5 .

Supposing the local government decided to levy tax based on the predicted class of

houses and the tax rate is fixed as:

Class

1 =

250

Class

2 =

500

Class

3 =

800

2 / 3

Estimate the overall net gain or loss the government will make due to misclassification

(

based on the misclassification of the validation data.

)

Use the Assessment sheet in

BostonHousing

.

xls

file to explain your computation.

Charles Book Club

Identify the customers to whom you will send the mailers, using as a cut

-

off criterion a

20 %

probability of a purchase, as predicted by a logistic regression model. In other words, you would

propose mailing to any customer who has a predicted probability of purchase

> = 0.20 (

as predicted

by your logistic regression models

) .

Build

3

models by selecting variables as follows

(

partitioning

50 %

to training,

30 %

to validation, and

20 %

to test; we will not be using the test partition right

now

)

.

The full set of

15

predictors as independent variables and "Florence" as the dependent

variable,

.

A best subset selection of six variables

(

note: the best subset selection process in

XLMiner is similar to that for multiple linear regression for subset selection

)

.

Only the R

,

,

M variables.

(

Note: Once you have generated XLMiner output LR

_

ClassifyValid sheet, you can update the

classification of records by modifying the cut

-

off criterion field. Verify this

(

for illustrative

purposes

)

by sorting the data in descending order of values in the "Predicted Prob. of

Success" field in order to locate the cut

-

off value of

0.2

20 % .

You will need this sort to

answer question

2

below.

)

2 .

Indicate the trial that produced better results and explain why.

Tabulate the results of the three experiments conducted above and report which experiment

produced the best result. Tabulate your conclusion in the Assessment table below.

Assessment Table

Srl Evaluation Technique Target

Customer

Nos.

Actual

Response

Nos.

Response

Rate

(%)

1 .

If we sent mailers to all the customers in

validation data set.

2 .

Logistics Regression: cut

-

off rate of

20 %

probability of success.

.

The full set of

15

predictors as

3 / 3

dependent variables and "Florence"

as the independent variable,

.

Best subset selection of six

variables, including constant

*

.

Only the R

,

,

M variables.

*

If you wish, you can use the "best subset" option in Logistic Regression. When you implement it

,

should set "size of best subset" to

" 8 "

and

"

# of best subsets" to

" 1 " (

this will produce one best

subset for each size of

8

and smaller

.

.

one best subset with

8

variables, one with

7,

one with

six, etc.

) .

After you run logistic regression with best subset selected, the summary output page will

give you information you can use in choosing which variables to use. You then need to run logistic

regression again, using those variables.

3 .

For the best model, indicate the lift ratio for first

250

cases.

Note: The lift ratio is calculated as follows:

1 .

Sort by predicted probability of success

2 .

For the first

250

rows, determine response rate

("

actuals

"

divided by

250)

3 .

Divide this response rate by the response rate obtained by mailing to everyone

(

1

above

) .

4 .

This is the "lift" obtained by mailing to

250

cases.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Visual Basic6 Database Programming

Authors: John W. Fronckowiak, David J. Helda

1st Edition

★★★★★

7-6 Does anyone own the Internet? Discuss how the Internet is currently governed and consider whether countries should be able to impose its own rules on the Internet within their own borders.

Answered: 1 week ago

Previous Question Next Question