Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

Project 2 Data Analysis and Visualisation of Malicious Credit Card Transaction Worth: 1 5 % of the unit Submission: ( 1 ) your code and

Project

2

Data Analysis and Visualisation of Malicious Credit Card Transaction

Worth:

15 %

of the unit

Submission:

(1)

your code and

(2)

your data analysis and visualisation report on the quiz server.

Deadline:

24

th May

2024 5

Late submissions: late submissions attract a

5 %

raw penalty per day up to

7

days

(

.

., 31

st May

2024 5

) .

After that, the mark will be

0 (

zero

) .

Also, any plagiarised work will be marked zero.

1 .

Outline

In this project, we will continue from our Project

1

where we implemented a malicious credit card transaction

detection system. But instead of implementing the features

(

which we completed in Project

1),

we will now

focus on data analysis and visualisation skills to better present what our datasets contain. For this project, you

will be given a dataset

(

CreditCard

_2024_

Project

2 .

csv

)

that contain credit card transactions that are

already labelled normal or malicious. Your task is to perform the following steps

(

more details in the tasks

section

)

Data analysis

Data visualisation

Write data analysis and visualisation report

(

bonus

)

use machine learning to implement detection

Note

1

: This is an individual project, so please refrain from sharing your code or files with others. However,

you can have high

-

level discussions about the syntax of the formula or the use of modules with other examples.

Please note that if it is discovered that you have submitted work that is not your own, you may face penalties. It

is also important to keep in mind that ChatGPT and other similar tools are limited in their ability to generate

outputs, and it is easy to detect if you use their outputs without understanding the underlying principles. The

main goal of this project is to demonstrate your understanding of programming principles and how they can be

applied in practical contexts.

Note

2

: you do not necessarily have to complete project

1

to do this project, as it is more about data analysis and

visualisation of the datasets you are given.

2 .

Tasks

To begin, you need to define a main

(

filename

,

filter

_

value, type

_

_

card

)

function that will

read the dataset and store the transaction records in data and call the below functions to display appropriate

results.

Sample Input:

main

('

CreditCard

_2024_

Project

2 .

csv

',

'Port Lincoln', 'ANZ'

)

Task

1

: Data Analysis using NumPy Mark:

15

Answer the following

5

NumPy related tasks for data analysis. These will require use of NumPy functions and

methods, matrix manipulations, vectorized computations, NumPy statistics, NumPy where function, etc. To

complete this task, write a function called task

1 (

data

,

filter

_

value, type

_

_

card

),

where

CITS

2401

Computer Analysis

and Visualisation

Page

2

6

data contains all records from the dataset and filter

_

value is an area name and type

_

_

card is the

name of the card provider. The function should return a list containing values from the following questions.

Return all results rounded to two decimal points.

Input:

cos

_

dist, var, median, corr, pca

=

task

1 (

data

,

'Port Lincoln

',

'ANZ

')

output:

[0.06, 1337142.45, [5.75, 7.21], - 0.06, [0.73, 0.81, 0.7, 0.93, 0.72, . . .]]

.

cos

_

dist: Calculate cosine distance between normal and malicious transactions based on

_

validity

_

score.

Formula:

= 1

(,) =

Output:

(

cos

_

dist

) = 0.06

.

var: Filter transactions based on certain geographical area e

.

.

Port Lincoln and calculate and display

the variance of transaction amount for a specific area. Note: use the Actual area column from the

dataset. Use sample variance formula for calculation.

Output:

(

var

) = 1337142.45

iii. median: Filter data based on Type

_

_

card and then calculate the median of

Authentication

_

score value for transactions that are in the lower

25

(

inclusive

)

and upper

75

(

inclusive

)

percentile.

Output:

(

median

) = [5.75, 7.21]

.

corr: Filter malicious transactions where Actual and Origin places are different. Calculate

elementwise product between Authentication

_

score and IP

_

validation

_

score and then

perform correlation between the resultant vector and Amount column.

Output:

(

corr

) = - 0.06

.

pca: Create a N x

5

matrix where N is number of rows in the dataset and

5

is the number of

columns, we will call these features

(

Transaction

_

type, Entry

_

mode, Amount,

Authentication

_

score, and IP

_

validity

_

score

) (

before that you need to convert all

CITS

2401

Computer Analysis

and Visualisation

Page

3

6

string values to numerical values. You can assume there will always be

3

Transaction

_

type and

use the following values

-

ATM:

1,

EFTPOS:

2,

and Internet:

3

and four Entry

_

mode

-

Magnetic

Stripe:

1,

Manual:

2,

Chip Card Read:

3,

and NFC:

4) .

Calculate principal component analysis

(

PCA

)

to reduce the dimensionality of data to N X

1 .

The algorithm for PCA is:

)

Standardize the data along all the features

(

subtract mean and divid

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Spatial And Temporal Databases 10th International Symposium Sstd 2007 Boston Ma Usa July 2007 Proceedings Lncs 4605

Authors: Dimitris Papadias ,Donghui Zhang ,George Kollios

★★★★★

Explain where the United States stands from an international perspective on pay issues. page 477

Answered: 1 week ago

Previous Question Next Question