Question
Your boss, the leader of the Data Science Team at the bank where you work, has decided to extend their vacation for another 4 weeks
Your boss, the leader of the Data Science Team at the bank where you work, has decided to extend their vacation for another 4 weeks and has left you in charge. Prior to leaving on vacation, your boss was just starting to work on a report requested by the bank's Vice President of Fraud. The Vice President is concerned about 'New Account Fraud' at the bank and has asked the bank's Data Science Team to answer some questions about new account application events. To answer these questions, your boss has compiled a data set containing relevant historical events. You must now complete the work for your boss by analyzing the data and writing a report to the Vice President.
Using and analyzing the variant1.csv data file available at this link - https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022?resource=download&select=Variant+I.csv . answer the following
select any two different machine learning methods and apply the methods to the variant1.csv data set and do analysis.
QUESTIONS TO ANSWER:
Which two methods you selected,
Why you selected those methods to analyze the data
Explain your analysis.
What does the method and data tell you about the bank's new account applicants? Explain.
What features in the data seem to matter more in your analysis and method?
There are fraudulent applicant events flagged in the data. Is there anything unique about those events?
Are the fraud applicant events outliers or do they seem to blend in with or look the same as the rest of the bank's legitimate applicants?
Does the bank have multiple clusters of applicants? If yes, explain how they are unique? If no, explain why not.
Is there a chance that the bank might potentially miss applicant events that might actually be fraudulent? Explain your answer.
Within the data, are their certain applicant events that will never be considered fraudulent? If yes, what are the common data features of those applicants? If no, why?
Do you think the bank could predict future fraudulent events with this data set? Yes or No? Provide a short explanation.
What metrics did you use to evaluate the accuracy of the methods you chose? Please list.
At what specific step (or steps) within the fraud kill cycle could your method be utilized by the bank to help stop potential fraud amongst applicants?
How could your analysis be used by the bank to help pro-actively or re-actively detect fraud?
What are the potential weaknesses in your chosen methods?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started