Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

PLEASE HELP ME COMPLETE THIS ENTIRE PYTHON PROGRAMMING PROJECT Activity 1: Analysing the Dataset Create a Pandas DataFrame for Spambase dataset using the below link.

PLEASE HELP ME COMPLETE THIS ENTIRE PYTHON PROGRAMMING PROJECT

Activity 1: Analysing the Dataset

Create a Pandas DataFrame for Spambase dataset using the below link. This dataset consists of the following 58 columns. Most of the columns represent frequencies of a particular word or character in the email:

Column # Attribute Description
0 - 47 word_freq_WORD The first 48 columns are the percentage
of the frequencies of the particular word one column each
48 - 53 word_freq_CHAR The next 6 columns are the percentage of the frequencies of the
particular special character like semi-colon (;), exclamation (!) one column each
54 capital_run_length_average average length of uninterrupted sequences of capital letters
55 capital_run_length_longest length of longest uninterrupted sequence of capital letters
56 capital_run_length_total sum of the length of uninterrupted sequences of capital letters
57 email_type denotes whether the email is spam (1) or not (0)

Dataset Link: (can't include as Chegg doesn't allow so use something to represent it)

Print the first five rows of the dataset. Check for null values and treat them accordingly (if any).

[ ]

 
 
# Import modules # Load the dataset # Print the first five rows of the DataFrame 

Rename the last column of the DataFrame as target.

[ ]

 
 
# Rename the last column as 'target' 

Print the information of the DataFrame to verify the above update.

[ ]

 
 
# Print the dataset information 

Q: Are there any missing values?

A:

Activity 2: Train-Test Split

You have to determine the effect of all the features on the 'target' variable. Thus, every column other than the target is the feature variable and target column is the target variable.

Steps:

Create a list of all the features.

Split the "DataFrame" into features and target arrays using the features list.

Split the dataset into a training set and test set such that the training set contains 70% of the instances and the remaining instances will become the test set.

Reshape the target variable arrays into two-dimensional arrays by using reshape(-1, 1) function of the numpy module.

[ ]

 
 
# Split the DataFrame into the train and test sets. # Import the module # Create a features list # Split the DataFrame into the train and test sets such that test set has 30% of the values. # Reshape target arrays to 2-dimensional array. 

Activity 3: Normalisation of the Features

Get a descriptive analysis of the feature set and decide whether any normalisation is needed.

Describe the features for training data.

[ ]

 
 
# Get the descriptive statistics for 'X_train'. 

Describe the features for the testing data.

[ ]

 
 
# Get the descriptive statistics for 'X_test'. 

Q: Does the data needs normalisation? Why?

A:

If the answer to the above question is yes, Normalise the data by calculating their -scores (or standard scaler) in the following code sections.

Define the Standard Normalisation function.

[ ]

 
 
# Define the 'standard_scalar()' function for calculating Z-scores 

Hint -score for each value can be calculated by the following expression:

=

Where,

is an observation

is the population mean

is the population standard deviation

Apply the normalisation function to the features of the training data.

[ ]

 
 
# Apply the 'standard_scalar()' on X_train using apply() function and get the descriptive statistics of the normalised X_train 

Apply the normalisation function to the features of the test data.

[ ]

 
 
# Apply the 'standard_scalar()' on X_test and get the descriptive statistics of the normalised X_test 

Activity 4: Logistic Regression - Model Training

Implement Logistic Regression Classification using sklearn module to estimate the values of coefficients in the following way:

Deploy the model by importing the LogisticRegression class and create an object of this class.

Call the fit() function on the Logistic Regression object and print score using the score() function.

Print the coefficient values.

[ ]

 
 
# Deploy the 'LogisticRegression' model using the 'fit()' function. 

Get the beta coefficients for the features using the model object trained in the above code.

[ ]

 
 
# Print the beta coefficient values 

Activity 5: Logistic Regression - Model Prediction and Evaluation

Predict the values for both training and test sets by calling the predict() function on the Logistic Regression object.

[ ]

 
 
# Make predictions on the test dataset by using the 'predict()' function. 

Also, display the confusion matrix.

[ ]

 
 
# Display the results of confusion_matrix 

Q: What is the positive outcome out of both the labels?

A:

Q: Write the count of True Positives and True Negatives?

A:

Print the classification report values to evaluate the accuracy of your model.

[ ]

# Display the results of classification_report

Q Write the f1-score of both labels?

A:

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

What lessons in intervention design, does this case represent?

Answered: 1 week ago