Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

The MINIST handwritten digits dataset will be used. The dataset consists of 70,000 small square 2828 pixel grayscale images of handwritten single digits between 0

The MINIST handwritten digits dataset will be used. The dataset consists of

70,000 small square 2828 pixel grayscale images of handwritten single digits between 0 and

9. The dataset can be downloaded from Python sklearn package using this code.

From each image in the dataset, extract the following features:

1) The average intensity of all pixels in the image.

2) The area of black pixels.

3) The symmetry around the x-axis.

4) The symmetry around the y-axis.

After extracting these features, the shape of the data matrix D should be (700004) for rows

and columns respectively.

Calculate the correlation between feature 1 and 2. Interpret and discuss the results.

Using principal component analysis (PCA), visualize (i.e., by using plots) the dataset with

the extracted features.

Randomly split the dataset (D and y) into 60% and 40% for training and testing purposes

respectively.

The extracted datasets (i.e., train and test) will be used to train (fit) and evaluate the following

ML algorithms:

1- Support vector machine (SVM) algorithm:

a. Linear SVM (soft-margin): for the value of C, use grid-search cross-validation to

obtain the best value from the following set of values [10, 5, 1, 0.5, 0.1, 0.05, 0.01,

0.005, 0.001]. Use overall accuracy in the cross-validation process.

b. SVM with RBF kernel: for the values of C and , use grid-search cross-validation

to obtain the best value from the following set of ranges (C: [10, 5, 1, 0.5, 0.1, 0.05,

0.01, 0.005, 0.001]; : [10, 5, 1, 0.5, 0.1, 0.05, 0.01, 0.005, 0.001]). Use overall

accuracy in the cross-validation process.

c. Using the results from both a and b parts, use the testing set to report the final

evaluation result of each model; overall accuracy, and F-score as the evaluation

metrics.

2- K-nearest neighbor (KNN) algorithm:

a. Use grid-search cross-validation to obtain the best value K from the following set

of values [3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25]. Use overall accuracy in the

cross-validation process.

b. Using the results from part a, use the testing set to report the final evaluation result

of the KNN model; overall accuracy, and F-score as the evaluation metrics.

3- Naive Bayes algorithm: Fit the model by using a training dataset. Then, use the testing set

to report the final evaluation result of the KNN model; overall accuracy, and F-score as the

evaluation metrics.

Task 3: ML models:

Use your results in task 2 to create useful plots and tables that can be used to compare the

performance of the three algorithms. Use these plots and tables to discuss and interpret the

performance of these models on this specific dataset.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Driven Decisions Harnessing The Power Of Social Media Analytics To Boost Your Marketing Strategy

Authors: Franklin Iroegbu

1st Edition

★★★★★

2. In what ways is information technology being used by line managers?

Answered: 1 week ago

Previous Question Next Question