PLEASE HELP ME COMPLETE THESE PYTHON PROGRAMMING ACTIVITIES Activity 1 Create the Dummy Dataset In this activity, you have to create a dummy dataset for multiclass classification The steps to be followed are as follows 1 Create a dummy dataset having two columns representing two independent variables and a third column representing the target The number of records should be divided into 6 random groups like 500, 2270, 1900, 41, 2121, 272 such that the target columns has 6 different labels 0, 1, 2, 3, 4, 5 Recall To create a dummy data frame, use the make blob() function of the sklearn datasets module which will return two arrays feature array and the target array The syntax for the make blob() function is as follows Syntax make blobs(n samples, centers, n features, random state, cluster std) Instruction to remove warning messages Create two arrays using the 'make blobs()' function and store them in the 'features array' and 'target array' variables Hint In the make blobs() function use n samples 500, 2270, 1900, 41, 2121, 272 and center None for the division of target label into seven groups 2 Print the object type of the arrays created by the make blob() function and also print the number of rows and columns in them Find out the object type of the arrays created by the 'make blob()' function and the number of rows and columns in them Print the type of 'features array' and 'target array' Print the number of rows and column of 'features array' Print the number of rows and column of 'target array' Q How many rows are created in the feature and target columns A 3 Create a DataFrame from the two arrays using a Python dictionary Steps (Learned in Logistic Regression Decision Boundary lesson ) Create a dummy dictionary Add the feature columns as keys col 1, col 2, and target column as the target Add the values from the feature and target columns one by one respectively in the dictionary using List Comprehension Convert the dictionary into a DataFrame Print the first five rows of the DataFrame Create a Pandas DataFrame containing the items from the 'features array' and 'target array' arrays Import the module Create a dummy dictionary Convert the dictionary into DataFrame Print first five rows of the DataFrame Hint Use function from dict() to convert Python Dictionary to DataFrame Syntax pd DataFrame from dict(some dictionary) After this activity, the DataFrame should be created with two independent features columns and one dependent target column Activity 2 Dataset Inspection In this activity, you have to look into the distribution of the labels in the target column of the DataFrame 1 Print the number of occurrences of each label in the target column Display the number of occurrences of each label in the 'target' column 2 Print the percentage of the samples for each label in the target column Get the percentage of count of each label samples in the dataset Q How many unique labels are present in the DataFrame What are they A Q Is the DataFrame balanced A 3 Create a scatter plot between the columns col 1 and col 2 for all the labels to visualize the clusters of every label (or points) Create a scatter plot between 'col 1' and 'col 2' columns separately for all the labels in the same plot Import the module Define the size of the graph Create a for loop executing for every unique label in the target column Plot the scatter plot for 'col 1' and 'col 2' where 'target i Plot the x and y lables Display the legends and the graph Hint Revise the lesson Logistic Regression Decision Boundary After this activity, the labels to be predicted that is the target variables and their distribution should be known Activity 3 Train Test Split We need to predict the value of the target variable, using other variables Thus, the target is the dependent variable and other columns are the independent variables 1 Split the dataset into the training set and test set such that the training set contains 70 of the instances and the remaining instances will become the test set 2 Set random state 42 Import 'train test split' module Create the features data frame holding all the columns except the last column and print first five rows of this dataframe Create the target series that holds last column 'target' and print first five rows of this series Split the train and test sets using the 'train test split()' function 3 Print the number of rows and columns in the training and testing set Print the shape of all the four variables i e 'X train', 'X test', 'y train' and 'y test' After this activity, the features and target data should be split into training and testing data Activity 4 Apply SMOTE In this activity, if the data is imbalanced, oversample the data for the minority classes in the following way Create an object for the SMOTE using SMOTE() function Synthesize the data for the minority class using fit sample() function by passing the feature and target training variable Save the output of the above step, artificial data, in the new feature and target training variables Write the code to apply oversample the data Import the 'SMOTE' module from the 'imblearn over sampling' library Call the 'SMOTE()' function and store it in the a variable Call the 'fit sample()' function Print the number of rows and columns in the original and artificial feature and target data Print the number of rows and columns in the original and resampled data Q How many rows and columns are there in the original features data A Q How many rows and columns are there in the artificially generated features data A Print the number of occurrences of labels in the artificially generated target data Display the number of occurrences of each label in the artificially target data Q Are the number of occurrances equal for all the labels A After this activity, the training feature and target data should have the synthetic data such that all the labels have equal occurrances and the data is balanced

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 28, 2024

PLEASE HELP ME COMPLETE THESE PYTHON PROGRAMMING ACTIVITIES Activity 1: Create the Dummy Dataset In this activity, you have to create a dummy dataset for

PLEASE HELP ME COMPLETE THESE PYTHON PROGRAMMING ACTIVITIES

Activity 1: Create the Dummy Dataset

In this activity, you have to create a dummy dataset for multiclass classification.

The steps to be followed are as follows:

1. Create a dummy dataset having two columns representing two independent variables and a third column representing the target.

The number of records should be divided into 6 random groups like [500, 2270, 1900, 41, 2121, 272] such that the target columns has 6 different labels [0, 1, 2, 3, 4, 5].

Recall:

To create a dummy data-frame, use the make_blob() function of the sklearn.datasets module which will return two arrays feature_array and the target_array. The syntax for the make_blob() function is as follows:

Syntax: make_blobs(n_samples, centers, n_features, random_state, cluster_std)

[ ]

# Instruction to remove warning messages

[ ]

# Create two arrays using the 'make_blobs()' function and store them in the 'features_array' and 'target_array' variables.

Hint:

In the make_blobs() function use n_samples=[500, 2270, 1900, 41, 2121, 272] and center=None for the division of target label into seven groups.

2. Print the object-type of the arrays created by the make_blob() function and also print the number of rows and columns in them:

[ ]

# Find out the object-type of the arrays created by the 'make_blob()' function and the number of rows and columns in them. # Print the type of 'features_array' and 'target_array' # Print the number of rows and column of 'features_array' # Print the number of rows and column of 'target_array'

Q: How many rows are created in the feature and target columns?

3. Create a DataFrame from the two arrays using a Python dictionary.

Steps: (Learned in "Logistic Regression - Decision Boundary" lesson)

Create a dummy dictionary.

Add the feature columns as keys col 1, col 2, and target column as the target.

Add the values from the feature and target columns one by one respectively in the dictionary using List Comprehension.

Convert the dictionary into a DataFrame

Print the first five rows of the DataFrame.

[ ]

# Create a Pandas DataFrame containing the items from the 'features_array' and 'target_array' arrays. # Import the module # Create a dummy dictionary # Convert the dictionary into DataFrame # Print first five rows of the DataFrame

Hint:

Use function from_dict() to convert Python Dictionary to DataFrame.

Syntax: pd.DataFrame.from_dict(some_dictionary)

After this activity, the DataFrame should be created with two independent features columns and one dependent target column.

Activity 2: Dataset Inspection

In this activity, you have to look into the distribution of the labels in the target column of the DataFrame.

1. Print the number of occurrences of each label in the target column:

[ ]

# Display the number of occurrences of each label in the 'target' column.

2. Print the percentage of the samples for each label in the target column:

[ ]

# Get the percentage of count of each label samples in the dataset.

Q: How many unique labels are present in the DataFrame? What are they?

Q: Is the DataFrame balanced?

3. Create a scatter plot between the columns col 1 and col 2 for all the labels to visualize the clusters of every label (or points):

[ ]

# Create a scatter plot between 'col 1' and 'col 2' columns separately for all the labels in the same plot. # Import the module # Define the size of the graph # Create a for loop executing for every unique label in the `target` column. # Plot the scatter plot for 'col 1' and 'col 2' where 'target ==i" # Plot the x and y lables # Display the legends and the graph

Hint: Revise the lesson "Logistic Regression - Decision Boundary".

After this activity, the labels to be predicted that is the target variables and their distribution should be known.

Activity 3: Train-Test Split

We need to predict the value of the target variable, using other variables. Thus, the target is the dependent variable and other columns are the independent variables.

1. Split the dataset into the training set and test set such that the training set contains 70% of the instances and the remaining instances will become the test set.

2. Set random_state = 42.

[ ]

# Import 'train_test_split' module # Create the features data frame holding all the columns except the last column # and print first five rows of this dataframe # Create the target series that holds last column 'target' # and print first five rows of this series # Split the train and test sets using the 'train_test_split()' function.

3. Print the number of rows and columns in the training and testing set:

[ ]

# Print the shape of all the four variables i.e. 'X_train', 'X_test', 'y_train' and 'y_test'

After this activity, the features and target data should be split into training and testing data.

Activity 4: Apply SMOTE

In this activity, if the data is imbalanced, oversample the data for the minority classes in the following way:

Create an object for the SMOTE using SMOTE() function.

Synthesize the data for the minority class using fit_sample() function by passing the feature and target training variable.

Save the output of the above step, artificial data, in the new feature and target training variables.

[ ]

# Write the code to apply oversample the data. # Import the 'SMOTE' module from the 'imblearn.over_sampling' library. # Call the 'SMOTE()' function and store it in the a variable. # Call the 'fit_sample()' function.

Print the number of rows and columns in the original and artificial feature and target data:

[ ]

# Print the number of rows and columns in the original and resampled data.

Q: How many rows and columns are there in the original features data?

Q: How many rows and columns are there in the artificially generated features data?

Print the number of occurrences of labels in the artificially generated target data:

[ ]

# Display the number of occurrences of each label in the artificially target data.

Q. Are the number of occurrances equal for all the labels?

After this activity, the training feature and target data should have the synthetic data such that all the labels have equal occurrances and the data is balanced