Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Problem Context for Q 5 : ( 2 5 points ) Ana Gomez, a data analyst at Cha - Ching Bank has compiled data on
Problem Context for Q: points
Ana Gomez, a data analyst at ChaChing Bank has compiled data on past customers to whom ChaChing Bank marketed its Home Equity Line of Credit HELOC product. The data includes the age, sex, income, and whether or not the customer responded to the HELOC offer. Ana would like to team up with you to accomplish two data mining tasks:
a Develop a kNN model for predicting whether or not a bank customer will respond to a HELOC offer.
b Identify for each of the new customers if they are likely to respond to a HELOC offer.
Follow the kNN optimization with normalization process as shown the example process RidingMowers kNN Optimized Normalized.rmp with some changes as described below:
Make a copy of the RidingMowers process mentioned above. Rename the process by rightclicking it Doubleclick and load this process on the RapidMiner canvas to start making changes to it
Import HELOC.csv and HELOCscore.csv data into RapidMiner repository.
Load the files in the process appropriately connect them instead of the existing data files
Remove the Nominal to Binominal operator from the original process.
Instead, use the Numerical to Binominal operator to convert HELOC outcome variable to a binominal attribute.
Use the Set Role operator to set HELOC as the label role.
In the Edit Parameter Settings panel of the Optimize Parameters Grid operator, change the range of k to vary from a minimum of to maximum of in steps linear scale
Inside the Optimize Parameters Grid operator, change the split ratio of the Validation Split Validation operator to split ratio with stratified sampling.
In the kNN operator, change the measure types to MixedMeasures and mixed measure to MixedEuclideanDistance since we have numeric and categorical attribute Sex
In the Performance Binominal Classification operator, set the positive class to true and main criterion for optimization to fmeasure.
Run the process. Report the following results and provide your interpretation important
What is the optimal k value obtained?
What is the optimal f measure value for the validation partition?
What is the AUC of your model?
What is the precision, recall and accuracy of the model?
Provide screenshots of the following:
a Confusion matrix obtained from the Performance operator
b Result from Optimize Parameters Grid showing the optimal kvalue selected
c Result with table showing all the kvalues and performance metrics. Sort by fmeasure in descending order.
d Show the new customer data, clearly showing the confidencetrue confidencefalse and the predictionHELOC columns.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started