Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

PLEASE HELP ME COMPLETE THIS ENTIRE PYTHON PROGRAMMING PROJECT Activity 1: Analysing the Dataset Create a Pandas DataFrame for Spambase dataset using the below link.

PLEASE HELP ME COMPLETE THIS ENTIRE PYTHON PROGRAMMING PROJECT

Activity 1: Analysing the Dataset

Create a Pandas DataFrame for Spambase dataset using the below link. This dataset consists of the following 58 columns. Most of the columns represent frequencies of a particular word or character in the email:

Column #	Attribute	Description
0 - 47	word_freq_WORD	The first 48 columns are the percentage
		of the frequencies of the particular word one column each

48 - 53	word_freq_CHAR	The next 6 columns are the percentage of the frequencies of the
		particular special character like semi-colon (;), exclamation (!) one column each

54	capital_run_length_average	average length of uninterrupted sequences of capital letters

55	capital_run_length_longest	length of longest uninterrupted sequence of capital letters

56	capital_run_length_total	sum of the length of uninterrupted sequences of capital letters

57	email_type	denotes whether the email is spam (1) or not (0)

Dataset Link: (can't include as Chegg doesn't allow so use something to represent it)

Print the first five rows of the dataset. Check for null values and treat them accordingly (if any).

[ ]

# Import modules # Load the dataset # Print the first five rows of the DataFrame

Rename the last column of the DataFrame as target.

[ ]

# Rename the last column as 'target'

Print the information of the DataFrame to verify the above update.

[ ]

# Print the dataset information

Q: Are there any missing values?

Activity 2: Train-Test Split

You have to determine the effect of all the features on the 'target' variable. Thus, every column other than the target is the feature variable and target column is the target variable.

Steps:

Create a list of all the features.

Split the "DataFrame" into features and target arrays using the features list.

Split the dataset into a training set and test set such that the training set contains 70% of the instances and the remaining instances will become the test set.

Reshape the target variable arrays into two-dimensional arrays by using reshape(-1, 1) function of the numpy module.

[ ]

# Split the DataFrame into the train and test sets. # Import the module # Create a features list # Split the DataFrame into the train and test sets such that test set has 30% of the values. # Reshape target arrays to 2-dimensional array.

Activity 3: Normalisation of the Features

Get a descriptive analysis of the feature set and decide whether any normalisation is needed.

Describe the features for training data.

[ ]

# Get the descriptive statistics for 'X_train'.

Describe the features for the testing data.

[ ]

# Get the descriptive statistics for 'X_test'.

Q: Does the data needs normalisation? Why?

If the answer to the above question is yes, Normalise the data by calculating their -scores (or standard scaler) in the following code sections.

Define the Standard Normalisation function.

[ ]

# Define the 'standard_scalar()' function for calculating Z-scores

Hint -score for each value can be calculated by the following expression:

Where,

is an observation

is the population mean

is the population standard deviation

Apply the normalisation function to the features of the training data.

[ ]

# Apply the 'standard_scalar()' on X_train using apply() function and get the descriptive statistics of the normalised X_train

Apply the normalisation function to the features of the test data.

[ ]

# Apply the 'standard_scalar()' on X_test and get the descriptive statistics of the normalised X_test

Activity 4: Logistic Regression - Model Training

Implement Logistic Regression Classification using sklearn module to estimate the values of coefficients in the following way:

Deploy the model by importing the LogisticRegression class and create an object of this class.

Call the fit() function on the Logistic Regression object and print score using the score() function.

Print the coefficient values.

[ ]

# Deploy the 'LogisticRegression' model using the 'fit()' function.

Get the beta coefficients for the features using the model object trained in the above code.

[ ]

# Print the beta coefficient values

Activity 5: Logistic Regression - Model Prediction and Evaluation

Predict the values for both training and test sets by calling the predict() function on the Logistic Regression object.

[ ]

# Make predictions on the test dataset by using the 'predict()' function.

Also, display the confusion matrix.

[ ]

# Display the results of confusion_matrix

Q: What is the positive outcome out of both the labels?

Q: Write the count of True Positives and True Negatives?

Print the classification report values to evaluate the accuracy of your model.

[ ]

# Display the results of classification_report

Q Write the f1-score of both labels?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases In Telecommunications Ii Vldb 2001 International Workshop Dbtel 2001 Rome Italy September 10 2001 Proceedings Lncs 2209

Authors: Willem Jonker

2001st Edition

★★★★★

What lessons in intervention design, does this case represent?

Answered: 1 week ago

Previous Question Next Question