Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Project 2 Data Analysis and Visualisation of Malicious Credit Card Transaction Worth: 1 5 % of the unit Submission: ( 1 ) your code and

Project 2
Data Analysis and Visualisation of Malicious Credit Card Transaction
Worth: 15% of the unit
Submission: (1) your code and (2) your data analysis and visualisation report on the quiz server.
Deadline: 24th May 20245 pm
Late submissions: late submissions attract a 5% raw penalty per day up to 7 days (i.e.,31st May 20245 pm).
After that, the mark will be 0(zero). Also, any plagiarised work will be marked zero.
1. Outline
In this project, we will continue from our Project 1 where we implemented a malicious credit card transaction
detection system. But instead of implementing the features (which we completed in Project 1), we will now
focus on data analysis and visualisation skills to better present what our datasets contain. For this project, you
will be given a dataset (CreditCard_2024_Project2.csv) that contain credit card transactions that are
already labelled normal or malicious. Your task is to perform the following steps (more details in the tasks
section):
Data analysis
Data visualisation
Write data analysis and visualisation report
(bonus) use machine learning to implement detection
Note 1: This is an individual project, so please refrain from sharing your code or files with others. However,
you can have high-level discussions about the syntax of the formula or the use of modules with other examples.
Please note that if it is discovered that you have submitted work that is not your own, you may face penalties. It
is also important to keep in mind that ChatGPT and other similar tools are limited in their ability to generate
outputs, and it is easy to detect if you use their outputs without understanding the underlying principles. The
main goal of this project is to demonstrate your understanding of programming principles and how they can be
applied in practical contexts.
Note 2: you do not necessarily have to complete project 1 to do this project, as it is more about data analysis and
visualisation of the datasets you are given.
2. Tasks
To begin, you need to define a main(filename, filter_value, type_of_card) function that will
read the dataset and store the transaction records in data and call the below functions to display appropriate
results.
Sample Input:
main('CreditCard_2024_Project2.csv', 'Port Lincoln', 'ANZ')
Task 1: Data Analysis using NumPy Mark: 15
Answer the following 5 NumPy related tasks for data analysis. These will require use of NumPy functions and
methods, matrix manipulations, vectorized computations, NumPy statistics, NumPy where function, etc. To
complete this task, write a function called task1(data, filter_value, type_of_card), where
CITS 2401
Computer Analysis
and Visualisation
Page 2 of 6
data contains all records from the dataset and filter_value is an area name and type_of_card is the
name of the card provider. The function should return a list containing values from the following questions.
Return all results rounded to two decimal points.
Input:
cos_dist, var, median, corr, pca = task1(data, 'Port Lincoln ', 'ANZ ')
output:
[0.06,1337142.45,[5.75,7.21],-0.06,[0.73,0.81,0.7,0.93,0.72,...]]
i. cos_dist: Calculate cosine distance between normal and malicious transactions based on
IP_validity_score.
Formula:
=1
(,)=
Output:
print(cos_dist)=0.06
ii. var: Filter transactions based on certain geographical area e.g. Port Lincoln and calculate and display
the variance of transaction amount for a specific area. Note: use the Actual area column from the
dataset. Use sample variance formula for calculation.
Output:
print(var)=1337142.45
iii. median: Filter data based on Type_of_card and then calculate the median of
Authentication_score value for transactions that are in the lower 25th (inclusive) and upper
75th (inclusive) percentile.
Output:
print(median)=[5.75,7.21]
iv. corr: Filter malicious transactions where Actual and Origin places are different. Calculate
elementwise product between Authentication_score and IP_validation_score and then
perform correlation between the resultant vector and Amount column.
Output:
print(corr)=-0.06
v. pca: Create a N x 5 matrix where N is number of rows in the dataset and 5 is the number of
columns, we will call these features (Transaction_type, Entry_mode, Amount,
Authentication_score, and IP_validity_score)(before that you need to convert all
CITS 2401
Computer Analysis
and Visualisation
Page 3 of 6
string values to numerical values. You can assume there will always be 3 Transaction_type and
use the following values - ATM: 1, EFTPOS: 2, and Internet: 3 and four Entry_mode - Magnetic
Stripe: 1, Manual: 2, Chip Card Read: 3, and NFC: 4). Calculate principal component analysis (PCA)
to reduce the dimensionality of data to N X 1.
The algorithm for PCA is:
a) Standardize the data along all the features (subtract mean and divid

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Spatial And Temporal Databases 10th International Symposium Sstd 2007 Boston Ma Usa July 2007 Proceedings Lncs 4605

Authors: Dimitris Papadias ,Donghui Zhang ,George Kollios

2007th Edition

3540735399, 978-3540735397

Students also viewed these Databases questions

Question

2. Measurement allows researchers to make comparisons.

Answered: 1 week ago