Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For this practice work, you are to determine which model is best for prediction, report the right hyperparameters, and the resulting accuracy for the Digit

For this practice work, you are to determine which model is best for prediction, report the right hyperparameters, and the resulting accuracy for the Digit Recognition data set.

image text in transcribed

Steps are as follows: 1. Separate your data into training and testing. We will use cross-validation over the training set to select the right parameters

a. Use train_test_split to create a separate training and test set. X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=True, test_size=0.20)

b. For the training set, you have two choices to perform hyperparameter selection. i. Use cross-validation to evaluate each model variant and select the best hyperparameters (standard practice, most recommended)

ii. Create a hold-out validation set and train on one portion of the data and use the accuracy on the hold-out validation set to pick the right hyperparameters (also valid)

2. Steps to turn in for the assignment (Deliverables):

a. Train the four models with their default parameters. Report the resulting accuracy of each model using the default parameters.

b. For each of the four models, find the hyperparameters giving the highest accuracy on the validation set by performing an exhaustive grid search. Report the hyperparameter values and accuracy on the validation set.

i. Consider using sklearn.model_selection.GridSearchCV

ii. For the models with two hyperparameters, you will need to search both simultaneously to find the optimum combination

c. Now apply the highest accuracy trained models to the test set. Report the accuracy of each model.

image text in transcribed

this is the python code I have so far need help to calculate the logistic regression(validation accuracy and hyperparameters) and the final test set accuracy for each model:

import numpy as np

import pandas as pd

from sklearn.datasets import load_digits

digits = load_digits()

X = digits.data

y = digits.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=42)

from sklearn.svm import SVC

from sklearn.neighbors import KNeighborsClassifier

from sklearn.tree import DecisionTreeClassifier

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

svm = SVC()

knn = KNeighborsClassifier()

dt = DecisionTreeClassifier()

lr = LogisticRegression(penalty='l1', solver='saga', max_iter=10000)

models = [svm, knn, dt, lr]

model_names = ['SVM', 'k-NN', 'Decision Trees', 'Logistic Regression']

for i, model in enumerate(models):

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

acc = accuracy_score(y_test, y_pred)

print(f'{model_names[i]} default accuracy: {acc:.3f}')

from sklearn.model_selection import GridSearchCV

svm_param_grid = {'C': 10.0 ** np.arange(-5, 6),

'gamma': 10.0 ** np.arange(-5, 6)}

svm_grid = GridSearchCV(SVC(kernel='rbf'), svm_param_grid, cv=5, n_jobs=-1)

svm_grid.fit(X_train, y_train)

print(f'SVM best accuracy: {svm_grid.best_score_:.3f}')

print(f'SVM best parameters: {svm_grid.best_params_}')

knn_param_grid = {'n_neighbors': [1, 3, 5, 7, 9]}

knn_grid = GridSearchCV(KNeighborsClassifier(), knn_param_grid, cv=5, n_jobs=-1)

knn_grid.fit(X_train, y_train)

print(f'k-NN best accuracy: {knn_grid.best_score_:.3f}')

print(f'k-NN best parameters: {knn_grid.best_params_}')

dt_param_grid = {'min_samples_split': np.arange(2, 11)}

dt_grid = GridSearchCV(DecisionTreeClassifier(), dt_param_grid, cv=5, n_jobs=-1)

dt_grid.fit(X_train, y_train)

print(f'Decision Trees best accuracy: {dt_grid.best_score_:.3f}')

print(f'Decision Trees best parameters: {dt_grid.best_params_}')

Specifically, you are to test the following models Fill the following table with the information

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro PowerShell For Database Developers

Authors: Bryan P Cafferky

1st Edition

1484205413, 9781484205419

More Books

Students also viewed these Databases questions

Question

fscanf retums a special value EOF that stands for...

Answered: 1 week ago

Question

Why do HCMSs exist? Do they change over time?

Answered: 1 week ago