Question
PLEASE HELP ME COMPLETE THIS ENTIRE PYTHON PROGRAMMING PROJECT Activity 1: Analysing the Dataset Create a Pandas DataFrame for Spambase dataset using the below link.
PLEASE HELP ME COMPLETE THIS ENTIRE PYTHON PROGRAMMING PROJECT
Activity 1: Analysing the Dataset
Create a Pandas DataFrame for Spambase dataset using the below link. This dataset consists of the following 58 columns. Most of the columns represent frequencies of a particular word or character in the email:
Column # | Attribute | Description |
---|---|---|
0 - 47 | word_freq_WORD | The first 48 columns are the percentage |
of the frequencies of the particular word one column each | ||
48 - 53 | word_freq_CHAR | The next 6 columns are the percentage of the frequencies of the |
particular special character like semi-colon (;), exclamation (!) one column each | ||
54 | capital_run_length_average | average length of uninterrupted sequences of capital letters |
55 | capital_run_length_longest | length of longest uninterrupted sequence of capital letters |
56 | capital_run_length_total | sum of the length of uninterrupted sequences of capital letters |
57 | email_type | denotes whether the email is spam (1) or not (0) |
Dataset Link: (can't include as Chegg doesn't allow so use something to represent it)
Print the first five rows of the dataset. Check for null values and treat them accordingly (if any).
[ ]
# Import modules # Load the dataset # Print the first five rows of the DataFrame
Rename the last column of the DataFrame as target.
[ ]
# Rename the last column as 'target'
Print the information of the DataFrame to verify the above update.
[ ]
# Print the dataset information
Q: Are there any missing values?
A:
Activity 2: Train-Test Split
You have to determine the effect of all the features on the 'target' variable. Thus, every column other than the target is the feature variable and target column is the target variable.
Steps:
Create a list of all the features.
Split the "DataFrame" into features and target arrays using the features list.
Split the dataset into a training set and test set such that the training set contains 70% of the instances and the remaining instances will become the test set.
Reshape the target variable arrays into two-dimensional arrays by using reshape(-1, 1) function of the numpy module.
[ ]
# Split the DataFrame into the train and test sets. # Import the module # Create a features list # Split the DataFrame into the train and test sets such that test set has 30% of the values. # Reshape target arrays to 2-dimensional array.
Activity 3: Normalisation of the Features
Get a descriptive analysis of the feature set and decide whether any normalisation is needed.
Describe the features for training data.
[ ]
# Get the descriptive statistics for 'X_train'.
Describe the features for the testing data.
[ ]
# Get the descriptive statistics for 'X_test'.
Q: Does the data needs normalisation? Why?
A:
If the answer to the above question is yes, Normalise the data by calculating their -scores (or standard scaler) in the following code sections.
Define the Standard Normalisation function.
[ ]
# Define the 'standard_scalar()' function for calculating Z-scores
Hint -score for each value can be calculated by the following expression:
=
Where,
is an observation
is the population mean
is the population standard deviation
Apply the normalisation function to the features of the training data.
[ ]
# Apply the 'standard_scalar()' on X_train using apply() function and get the descriptive statistics of the normalised X_train
Apply the normalisation function to the features of the test data.
[ ]
# Apply the 'standard_scalar()' on X_test and get the descriptive statistics of the normalised X_test
Activity 4: Logistic Regression - Model Training
Implement Logistic Regression Classification using sklearn module to estimate the values of coefficients in the following way:
Deploy the model by importing the LogisticRegression class and create an object of this class.
Call the fit() function on the Logistic Regression object and print score using the score() function.
Print the coefficient values.
[ ]
# Deploy the 'LogisticRegression' model using the 'fit()' function.
Get the beta coefficients for the features using the model object trained in the above code.
[ ]
# Print the beta coefficient values
Activity 5: Logistic Regression - Model Prediction and Evaluation
Predict the values for both training and test sets by calling the predict() function on the Logistic Regression object.
[ ]
# Make predictions on the test dataset by using the 'predict()' function.
Also, display the confusion matrix.
[ ]
# Display the results of confusion_matrix
Q: What is the positive outcome out of both the labels?
A:
Q: Write the count of True Positives and True Negatives?
A:
Print the classification report values to evaluate the accuracy of your model.
[ ]
# Display the results of classification_report
Q Write the f1-score of both labels?
A:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started