Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem Context for Q 5 : ( 2 5 points ) Ana Gomez, a data analyst at Cha - Ching Bank has compiled data on

Problem Context for Q5: (25 points)
Ana Gomez, a data analyst at Cha-Ching Bank has compiled data on 500 past customers to whom Cha-Ching Bank marketed its Home Equity Line of Credit (HELOC) product. The data includes the age, sex, income, and whether or not the customer responded to the HELOC offer. Ana would like to team up with you to accomplish two data mining tasks:
(a) Develop a k-NN model for predicting whether or not a bank customer will respond to a HELOC offer.
(b) Identify for each of the 20 new customers if they are likely to respond to a HELOC offer.
Follow the k-NN optimization (with normalization) process as shown the example process 07-01-RidingMowers k-NN Optimized Normalized.rmp with some changes as described below:
Make a copy of the RidingMowers process mentioned above. Rename the process by right-clicking it. Double-click and load this process on the RapidMiner canvas to start making changes to it.
Import HELOC.csv and HELOC-score.csv data into RapidMiner repository.
Load the files in the process appropriately (connect them instead of the existing data files).
Remove the Nominal to Binominal operator from the original process.
Instead, use the Numerical to Binominal operator to convert HELOC outcome variable to a binominal attribute.
Use the Set Role operator to set HELOC as the label role.
In the Edit Parameter Settings panel of the Optimize Parameters (Grid) operator, change the range of k to vary from a minimum of 1 to maximum of 50 in 25 steps (linear scale).
Inside the Optimize Parameters (Grid) operator, change the split ratio of the Validation (Split Validation) operator to 0.75 split ratio with stratified sampling.
In the k-NN operator, change the measure types to MixedMeasures and mixed measure to MixedEuclideanDistance (since we have 2 numeric and 1 categorical attribute (Sex)).
In the Performance (Binominal Classification) operator, set the positive class to true and main criterion for optimization to f_measure.
Run the process. Report the following results and provide your interpretation (important).
1. What is the optimal k value obtained?
2. What is the optimal (f measure) value for the validation partition?
3. What is the AUC of your model?
4. What is the precision, recall and accuracy of the model?
5. Provide screenshots of the following:
a. Confusion matrix obtained from the Performance operator
b. Result from Optimize Parameters (Grid) showing the optimal k-value selected
c. Result with table showing all the k-values and performance metrics. Sort by f_measure in descending order.
d. Show the 20 new customer data, clearly showing the confidence(true), confidence(false), and the prediction(HELOC) columns.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional Microsoft SQL Server 2012 Administration

Authors: Adam Jorgensen, Steven Wort

1st Edition

1118106881, 9781118106884

More Books

Students also viewed these Databases questions

Question

Differentiate 3sin(9x+2x)

Answered: 1 week ago

Question

Compute the derivative f(x)=(x-a)(x-b)

Answered: 1 week ago

Question

In what ways are you similar to your closest friends?

Answered: 1 week ago