Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Instructions MNIST number dataset a set of 7 0 , 0 0 0 small images of digits handwritten by high school students and employees of

Instructions
MNIST number dataset a set of 70,000 small images of digits handwritten by high school students and employees of the US Cen-sus Bureau. Each image is labeled with the digit it represents. This set has been studied so much that it is often called the "hello world" of Machine Learning: whenever people come up with a new classification algorithm they are curious to see how it will perform on MNIST, and anyone who learns Machine Learning tackles this dataset sooner or later.
Instructions to explore this dataset are:
Data Acquisition and Initial Analysis:
Retrieve the MNIST dataset.
Perform exploratory data analysis to understand the dataset's structure, including
i. how many images
ii. how many features and the range of feature values (e.g., histogram of the data value), relating it to real-world, such as real images.
iii. how many categories/labels (discrete or continuous type) and what they are?
iv. visualize at least three randomly selected samples within each category (feel the variance of the data)
v. visualize more data samples to see whether there are bad data samples need to be removed. What bad data samples do you think can be?
Data Preparation and Manipulation:
Apply dimensionality reduction techniques (PCA and t-SNE) to the MNIST dataset and visualize the results.
Split the dataset into training (60,000 samples) and testing (10,000 samples) sets.
Machine Learning Model Implementation:
Train a Random Forest classifier on the original dataset and record its performance.
Use PCA to reduce the dataset's dimensionality to 174. Train a new Random Forest classifier on the reduced dataset and see how long it takes. Was training much faster? Then, evaluate the classifier on the test set. How does it compare to the previous classifier?
Critical Evaluation and Conclusion:
Provide a comprehensive evaluation of the performance of the models.
Summarize findings and insights.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Big Data And Hadoop Fundamentals Tools And Techniques For Data Driven Success

Authors: Mayank Bhushan

2nd Edition

9355516665, 978-9355516664

More Books

Students also viewed these Databases questions

Question

explain what is meant by redundancy

Answered: 1 week ago