Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The goal of this project is to detect fraudulent online payments using machine learning methods. You will apply various supervised classification algorithms to identify fraudulent

The goal of this project is to detect fraudulent online payments using machine learning methods. You will apply various supervised classification algorithms to identify fraudulent transactions from a provided dataset. This project will enhance your understanding of data preprocessing, feature engineering, model building, and hyperparameter tuning in the context of a real-world financial application. You will work with a dataset that contains online payment transaction records. Each record has the following columns: the data set is of 1048576 rows and 11columns
step: (integer ex.1)
type: Type of transaction (string e.g., CASH-IN, CASH-OUT, DEBIT, PAYMENT,.).
amount: The amount of the transaction. (float ex.9839.64)
nameOrig: The account identifier of the originator of the transaction.(String + Integer Ex. C1231006815)
oldbalanceOrg: The initial balance of the originator before the transaction. (float ex.170136)
newbalanceOrig: The balance of the originator after the transaction. (float ex.160296.4)
nameDest: The account identifier of the recipient of the transaction. (string + integer ex. M2044282225)
oldbalanceDest: The initial balance of the recipient before the transaction. (float ex.0)
newbalanceDest: The balance of the recipient after the transaction. (float ex.0)
isFraud: Binary indicator if the transaction is fraudulent (1) or not (0).
isFlaggedFraud: Binary indicator if the transaction is flagged as fraudulent by the system (1) or not (0).
Begin by importing the dataset into your Python environment and handling any missing values appropriately. Proceed to feature engineering, creating new features that may be useful for fraud detection. Normalize the numerical features to ensure all values are within a similar range.
Next, conduct an exploratory data analysis (EDA). Generate summary statistics for the dataset and create visualizations to understand the distribution of the data, identify patterns, and detect any anomalies.
For model building, split the dataset into training and testing sets. Implement the following machine learning algorithms: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Gradient Boosting Machine (GBM) and hybrid algorithm of your choice. Initially, run each model with default parameters to establish a baseline performance.
After establishing baseline models, proceed with hyperparameter tuning using techniques like Grid Search or Random Search or any you want to find the best parameters for each algorithm. Evaluate the performance of each model using a confusion matrix and calculate the Accuracy, Precision, Recall, and F1-Score. Additionally, plot the ROC curves and calculate the AUC for each model to evaluate the trade-off between the true positive rate and false positive rate.
Note : Enhance your model by identifying and visualizing the most important features for fraud detection. Explore and implement ensemble methods to combine multiple models for improved performance. Propose and implement any additional enhancements or optimizations, such as incorporating domain-specific knowledge or using advanced feature selection methods. Additionally, attempt to create a hybrid algorithm to further improve detection accuracy.
Document your entire process, including data preprocessing steps, EDA findings, model building, tuning, and evaluation results. Analyze which model performed the best and explain why, discussing any challenges encountered and how you addressed them. Prepare a presentation summarizing your findings, model performance, and any proposed enhancements.
Submit your Python code (Jupyter notebooks or .py files) used for data preprocessing, EDA, model building, and evaluation
Submit the graph you obtained from each algo and also a combine graph showing the comparison.
Tip: the dataset is available in the Kaggle by named Online Payments Fraud Detection Dataset
Online payment fraud big dataset for testing and practice purpose
Note: please do not write the ChatGPT code or already existing code. Write your own code with enhanced version.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction To Data Mining

Authors: Pang Ning Tan, Michael Steinbach, Vipin Kumar

1st Edition

321321367, 978-0321321367

More Books

Students also viewed these Databases questions

Question

6.2 Explain the recruitment process.

Answered: 1 week ago