Question
Make use of the scikit-learn (sklearn) and yellowbrick python packages in your function implementations Complete the KmeansClustering class in task3.pykmeans_train Initialize a sklearn Kmeans model
Make use of the scikit-learn (sklearn) and yellowbrick python packages in your function implementations
Complete the KmeansClustering class in task3.pykmeans_train
Initialize a sklearn Kmeans model using random_state, n_init =10. Initialize a yellowbrick KElbowVisualizer to search for the optimal value of k (between 1 and 10). Train the KElbowVisualizer on the training data and determine the optimal k value. Then Train a Kmeans model with the proper initialization for that optimal value of k and return the cluster ids for each row of the training set as a list.
kmeans_test
Using the model you trained in the previous function return the cluster ids for each row of the test set as a list.
train_add_kmeans_cluster_id_feature
Using kmeans_train add an additional column to the training features and return the training dataframe with all input features untouched and the additional cluster id column with the column name kmeans_cluster_id
test_add_kmeans_cluster_id_feature
Using kmeans_test add an additional column to the test features and return the test dataframe with all input features untouched and the additional cluster id column with the column name kmeans_cluster_id
import numpy as np import pandas as pd import sklearn.cluster import yellowbrick.cluster
class KmeansClustering: def __init__(self, train_features:pd.DataFrame, test_features:pd.DataFrame, random_state: int ): # TODO: Add any state variables you may need to make your functions work pass
def kmeans_train(self) -> list: # TODO: train a kmeans model using the training data, determine the optimal value of k (between 1 and 10) with n_init set to 10 and return a list of cluster ids # corresponding to the cluster id of each row of the training data cluster_ids = list() return cluster_ids
def kmeans_test(self) -> list: # TODO: return a list of cluster ids corresponding to the cluster id of each row of the test data cluster_ids = list() return cluster_ids
def train_add_kmeans_cluster_id_feature(self) -> pd.DataFrame: # TODO: return the training dataset with a new feature called kmeans_cluster_id output_df = pd.DataFrame() return output_df
def test_add_kmeans_cluster_id_feature(self) -> pd.DataFrame: # TODO: return the test dataset with a new feature called kmeans_cluster_id output_df = pd.DataFrame() return output_df
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started