Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Make use of the scikit-learn (sklearn) and yellowbrick python packages in your function implementations Complete the KmeansClustering class in task3.pykmeans_train Initialize a sklearn Kmeans model

Make use of the scikit-learn (sklearn) and yellowbrick python packages in your function implementations

Complete the KmeansClustering class in task3.pykmeans_train

Initialize a sklearn Kmeans model using random_state, n_init =10. Initialize a yellowbrick KElbowVisualizer to search for the optimal value of k (between 1 and 10). Train the KElbowVisualizer on the training data and determine the optimal k value. Then Train a Kmeans model with the proper initialization for that optimal value of k and return the cluster ids for each row of the training set as a list.

kmeans_test

Using the model you trained in the previous function return the cluster ids for each row of the test set as a list.

train_add_kmeans_cluster_id_feature

Using kmeans_train add an additional column to the training features and return the training dataframe with all input features untouched and the additional cluster id column with the column name kmeans_cluster_id

test_add_kmeans_cluster_id_feature

Using kmeans_test add an additional column to the test features and return the test dataframe with all input features untouched and the additional cluster id column with the column name kmeans_cluster_id

import numpy as np import pandas as pd import sklearn.cluster import yellowbrick.cluster

class KmeansClustering: def __init__(self, train_features:pd.DataFrame, test_features:pd.DataFrame, random_state: int ): # TODO: Add any state variables you may need to make your functions work pass

def kmeans_train(self) -> list: # TODO: train a kmeans model using the training data, determine the optimal value of k (between 1 and 10) with n_init set to 10 and return a list of cluster ids # corresponding to the cluster id of each row of the training data cluster_ids = list() return cluster_ids

def kmeans_test(self) -> list: # TODO: return a list of cluster ids corresponding to the cluster id of each row of the test data cluster_ids = list() return cluster_ids

def train_add_kmeans_cluster_id_feature(self) -> pd.DataFrame: # TODO: return the training dataset with a new feature called kmeans_cluster_id output_df = pd.DataFrame() return output_df

def test_add_kmeans_cluster_id_feature(self) -> pd.DataFrame: # TODO: return the test dataset with a new feature called kmeans_cluster_id output_df = pd.DataFrame() return output_df

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2015 Porto Portugal September 7 11 2015 Proceedings Part 3 Lnai 9286

Authors: Albert Bifet ,Michael May ,Bianca Zadrozny ,Ricard Gavalda ,Dino Pedreschi ,Francesco Bonchi ,Jaime Cardoso ,Myra Spiliopoulou

1st Edition

3319234609, 978-3319234601

More Books

Students also viewed these Databases questions

Question

2. Write two or three of your greatest weaknesses.

Answered: 1 week ago