Question
For this practice work, you are to determine which model is best for prediction, report the right hyperparameters, and the resulting accuracy for the Digit
For this practice work, you are to determine which model is best for prediction, report the right hyperparameters, and the resulting accuracy for the Digit Recognition data set.
Steps are as follows: 1. Separate your data into training and testing. We will use cross-validation over the training set to select the right parameters
a. Use train_test_split to create a separate training and test set. X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=True, test_size=0.20)
b. For the training set, you have two choices to perform hyperparameter selection. i. Use cross-validation to evaluate each model variant and select the best hyperparameters (standard practice, most recommended)
ii. Create a hold-out validation set and train on one portion of the data and use the accuracy on the hold-out validation set to pick the right hyperparameters (also valid)
2. Steps to turn in for the assignment (Deliverables):
a. Train the four models with their default parameters. Report the resulting accuracy of each model using the default parameters.
b. For each of the four models, find the hyperparameters giving the highest accuracy on the validation set by performing an exhaustive grid search. Report the hyperparameter values and accuracy on the validation set.
i. Consider using sklearn.model_selection.GridSearchCV
ii. For the models with two hyperparameters, you will need to search both simultaneously to find the optimum combination
c. Now apply the highest accuracy trained models to the test set. Report the accuracy of each model.
this is the python code I have so far need help to calculate the logistic regression(validation accuracy and hyperparameters) and the final test set accuracy for each model:
import numpy as np
import pandas as pd
from sklearn.datasets import load_digits
digits = load_digits()
X = digits.data
y = digits.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=42)
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
svm = SVC()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
lr = LogisticRegression(penalty='l1', solver='saga', max_iter=10000)
models = [svm, knn, dt, lr]
model_names = ['SVM', 'k-NN', 'Decision Trees', 'Logistic Regression']
for i, model in enumerate(models):
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f'{model_names[i]} default accuracy: {acc:.3f}')
from sklearn.model_selection import GridSearchCV
svm_param_grid = {'C': 10.0 ** np.arange(-5, 6),
'gamma': 10.0 ** np.arange(-5, 6)}
svm_grid = GridSearchCV(SVC(kernel='rbf'), svm_param_grid, cv=5, n_jobs=-1)
svm_grid.fit(X_train, y_train)
print(f'SVM best accuracy: {svm_grid.best_score_:.3f}')
print(f'SVM best parameters: {svm_grid.best_params_}')
knn_param_grid = {'n_neighbors': [1, 3, 5, 7, 9]}
knn_grid = GridSearchCV(KNeighborsClassifier(), knn_param_grid, cv=5, n_jobs=-1)
knn_grid.fit(X_train, y_train)
print(f'k-NN best accuracy: {knn_grid.best_score_:.3f}')
print(f'k-NN best parameters: {knn_grid.best_params_}')
dt_param_grid = {'min_samples_split': np.arange(2, 11)}
dt_grid = GridSearchCV(DecisionTreeClassifier(), dt_param_grid, cv=5, n_jobs=-1)
dt_grid.fit(X_train, y_train)
print(f'Decision Trees best accuracy: {dt_grid.best_score_:.3f}')
print(f'Decision Trees best parameters: {dt_grid.best_params_}')
Specifically, you are to test the following models Fill the following table with the informationStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started