Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The MINIST handwritten digits dataset will be used. The dataset consists of 70,000 small square 2828 pixel grayscale images of handwritten single digits between 0

The MINIST handwritten digits dataset will be used. The dataset consists of
70,000 small square 2828 pixel grayscale images of handwritten single digits between 0 and
9. The dataset can be downloaded from Python sklearn package using this code.
From each image in the dataset, extract the following features:
1) The average intensity of all pixels in the image.
2) The area of black pixels.
3) The symmetry around the x-axis.
4) The symmetry around the y-axis.
After extracting these features, the shape of the data matrix D should be (700004) for rows
and columns respectively.
Calculate the correlation between feature 1 and 2. Interpret and discuss the results.
Using principal component analysis (PCA), visualize (i.e., by using plots) the dataset with
the extracted features.
Randomly split the dataset (D and y) into 60% and 40% for training and testing purposes
respectively.
The extracted datasets (i.e., train and test) will be used to train (fit) and evaluate the following
ML algorithms:
1- Support vector machine (SVM) algorithm:
a. Linear SVM (soft-margin): for the value of C, use grid-search cross-validation to
obtain the best value from the following set of values [10, 5, 1, 0.5, 0.1, 0.05, 0.01,
0.005, 0.001]. Use overall accuracy in the cross-validation process.
b. SVM with RBF kernel: for the values of C and , use grid-search cross-validation
to obtain the best value from the following set of ranges (C: [10, 5, 1, 0.5, 0.1, 0.05,
0.01, 0.005, 0.001]; : [10, 5, 1, 0.5, 0.1, 0.05, 0.01, 0.005, 0.001]). Use overall
accuracy in the cross-validation process.
c. Using the results from both a and b parts, use the testing set to report the final
evaluation result of each model; overall accuracy, and F-score as the evaluation
metrics.
2- K-nearest neighbor (KNN) algorithm:
a. Use grid-search cross-validation to obtain the best value K from the following set
of values [3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25]. Use overall accuracy in the
cross-validation process.
b. Using the results from part a, use the testing set to report the final evaluation result
of the KNN model; overall accuracy, and F-score as the evaluation metrics.
3- Naive Bayes algorithm: Fit the model by using a training dataset. Then, use the testing set
to report the final evaluation result of the KNN model; overall accuracy, and F-score as the
evaluation metrics.
Task 3: ML models:
Use your results in task 2 to create useful plots and tables that can be used to compare the
performance of the three algorithms. Use these plots and tables to discuss and interpret the
performance of these models on this specific dataset.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions