Question
PLEASE HELP ME COMPLETE THIS WHOLE PYTHON PROGRAMMING PROJECT List of Activities Activity 1: Loading and Analysing the Dataset Activity 2: Data Visualization Activity 3:
PLEASE HELP ME COMPLETE THIS WHOLE PYTHON PROGRAMMING PROJECT
List of Activities
Activity 1: Loading and Analysing the Dataset
Activity 2: Data Visualization
Activity 3: Support Vector Classifier - Model Training
Activity 4: Model Prediction and Evaluation
Activity 1: Analysing the Dataset
You are given with the Seaborn dataset on Penguins. This dataset consists of the following columns:
Field | Description |
---|---|
species | Categorical; states species of the Penguin |
island | Categorical; states home island name for the Penguin in Antartica |
bill_length_mm | Numeric; Length measured from the upper edge of the beak (bill) to the base of the skull or the first feathers in mm |
bill_depth_mm | Numeric; Depth measure from the lower edge of the beak to the upper edge in mm |
flipper_length_mm | Numeric; Length of the fin of the Penguin in mm |
body_mass_g | Numeric; Body mass of the Penguin in grams. |
sex | Categorical; Gender of the Penguin |
Dataset Link: https://s3-student-datasets-bucket.whjr.online/whitehat-ds-datasets/penguin.csv
Dataset Credits: Python Seaborn Package
Citation
Allison Marie Horst, Alison Presmanes Hill, & Kristen B Gorman. (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data.
1. Load the dataset in a DataFrame
2. Print the first five rows of the dataset.
[ ]
# Import the required modules and load the dataset # Load the DataFrame # Display the first five rows of the DataFrame
3. Print the information of the DataFrame.
[ ]
# Print the dataset information
Q: Which are the object type (categorical) columns?
A:
4. Find the number of missing values in each column of the DataFrame
[ ]
# Print the number of missing values in each column
Q: Are there any missing values?
A:
Q: Which columns have missing values?
A:
5. Drop the missing values from all the columns and verify the same
[ ]
# Drop the missing values and verify # Drop the NAN values # Verify the above by printing number of missing values in each column.
6. Print the number of occurences of each species in species column.
[ ]
# Display the number of occurrences of each species of Penguin in the 'species' column.
Q: What are the different species of Penguin available in the column species?
A:
Q: What is the type of the column species?
A:
7. Add another column Label to the DataFrame to convert the non-numeric target column species into numeric. Print first five rows of DataFrame
[ ]
# Add numeric column 'label' to resemble non numeric column 'species # Print first five rows of the DataFrame
8. Print the number of occurences of each species in label column.
[ ]
# Display the number of occurrences of each species of Penguin in the 'label' column.
Q: What are the different labels available in the column label?
A:
9. Convert the non-numeric columns sex into numeric.
[ ]
# Convert the non-numeric column 'sex' to numeric in the DataFrame # Print the number of occurance of each label in 'sex' column # Convert the 'sex' column to numeric # Print the number of occurance of each label in 'sex' column after converting # Print the Datatype of teh 'sex' column
10. Convert the non-numeric columns island into numeric.
[ ]
# Convert the non-numeric column 'island' to numeric in the DataFrame # Print the number of occurance of each label in 'island' column # Convert the 'island' column to numeric # Print the number of occurance of each label in 'island' column after converting # Print the Datatype of the 'island' column
Hint: For conversion of non-numeric columns to numeric use the map() function
After this activity, the dataset should be loaded in the DataFrame and the required columns should be of numeric type.
Activity 2: Data Visualization
In this activity, you have to create scatter plots for different features and each plot differentiate between the data points of different classes (Species of the Penguin).
1. Create a scatter plot between bill_length_mm and bill_depth_mm
[ ]
# Create a scatter plot between 'bill_length_mm' and 'bill_depth_mm'
Q Write your interpretation about the output of the graph.
A
2. Create a scatter plot between bill_length_mm and flipper_length_mm.
[ ]
# Create a scatter plot between 'bill_length_mm' and 'flipper_length_mm'
Q Write your interpretation about the output of the graph.
A
3. Create a scatter plot between bill_depth_mm and flipper_length_mm.
[ ]
# Create a scatter plot between 'bill_depth_mm' and 'flipper_length_mm'
Q Write your interpretation about the output of the graph.
A
After this activity, the relation between the independent features of Penguins and their speicies should be recognised. Also, student can create more such Visualization for understanding the relation between rest of the columns
Activity 3: Train-Test Split
We need to predict the value of the label variable, using other variables to predict the species of the Penguin. Thus, label is the dependent variable and island, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, sex columns are the independent variables.
1. Split the dataset into the training set and test set such that the testing set contains 33% of the instances and the remaining instances will become the training set.
2. Set random_state = 42.
[ ]
# Split the data into Training and Testing set # Import all the libraries # Create X and y variables # Split the data into training and testing sets
After this activity, the features and target data should be splitted into training and testing data.
Activity 4: Support Vector Classifier - Model Training
Implement Linear Support Vector Classification using sklearn.svm module in the following way:
Deploy the model by importing the SVC class and create an object of this class.
Call the fit() function on the Support Vector Classifier object and print the score using the score() function.
[ ]
# Build a SVC model using the 'sklearn' module. # 1. First, call the linear 'SVC' module and store it in a variable. # 2. Call the 'fit()' function with 'x_train' and 'y_train' as inputs. # 3. Call the 'score()' function with 'x_train' and 'y_train' as inputs to check the accuracy score of the model.
Q What is the accuracy score?
A
After this activity, a SVC model object should be trained for multiclass classification.
Activity 5: Model Prediction and Evaluation
In this activity, you will make predictions for training and testing set and evaluate the model
1. Predict the values for training set by calling the predict() function on the Logistic Regression object.
2. Print the distribution of the labels predicted in the predicted target series for the training features.
[ ]
# Make predictions on the train dataset by using the 'predict()' function. # Compute the predictions # Print the occurrence of each type computed in the predictions.
Q: Are all the label values predicted for the training features data?
A:
3. Predict the values for testing set by calling the predict() function on the Logistic Regression object.
4. Print the distribution of the labels predicted in the predicted target series for the testing features.
[ ]
#Make predictions on the test dataset by using the 'predict()' function. # Compute the predictions # Print the occurrence of each Penguin type computed in the predictions.
Q: Are all the labels predicted for the test features data?
A:
5. Display the confusion matrix for the test set:
[ ]
# Print the confusion matrix for the actual and predicted data of the test set
Q Are there any False Positives or False Negatives?
A
6. Display the classification report for the test set:
[ ]
# Print the classification report for the actual and predicted data of the testing set (if required)
Q What is the f1-score for all the labels?
A
After this activity, labels should be predicted for the target columns using test features set and the model should be evaluated for the same.
Write your interpretation of the results here.
Interpretation 1:
Interpretation 2:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started