Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Data Acquisition and Initial Analysis: Retrieve the MNIST dataset. Perform exploratory data analysis to understand the dataset's structure, including i . how many images ii

Data Acquisition and Initial Analysis:
Retrieve the MNIST dataset.
Perform exploratory data analysis to understand the dataset's structure, including
i. how many images
ii. how many features and the range of feature values (e.g., histogram of the data value),
relating it to real-world, such as real images.
iii. how many categories/labels (discrete or continuous type) and what they are?
iv. visualize at least three randomly selected samples within each category (feel the variance
of the data)
v. visualize more data samples to see whether there are bad data samples need to be
removed. What bad data samples do you think can be?
2. Data Preparation and Manipulation:
Apply dimensionality reduction techniques (PCA and t-SNE) to the MNIST dataset and visualize the
results.
Split the dataset into training (60,000 samples) and testing (10,000 samples) sets.
3. Machine Learning Model Implementation:
Train a Random Forest classifier on the original dataset and record its performance.
Use PCA to reduce the datasets dimensionality to 174. Train a new Random Forest classifier on the
reduced dataset and see how long it takes. Was training much faster? Then, evaluate the classifier on
the test set. How does it compare to the previous classifier?
4. Critical Evaluation and Conclusion:
Provide a comprehensive evaluation of the performance of the models.
Summarize findings and insights.
5. Research Question: Explore how various image preprocessing methods (e.g., normalization, binarization,
noise reduction, and image augmentation) influence the performance of at least two different machine
learning models (e.g., Convolutional Neural Networks and Random Forest classifiers) trained on the MNIST
dataset. Analyze the models' accuracy, training time, and ability to generalize to test data. Discuss your
findings' implications for designing machine learning pipelines in digit recognition tasks.
6. Reflect on the composition and diversity of the MNIST dataset, considering its impact on the training process
and model performance. Explore how the inclusion of a more diverse set of handwriting samples (e.g.,
different handwriting styles, inclusion of characters from non-Latin alphabets, or samples from wider age
3
groups) might affect the accuracy and generalizability of machine learning models trained for digit
recognition tasks.
Structure
Prepare a jupyter notebook for this assignment. The structure of the Jupyter notebook should alternate texts and
python codes and cover topics listed the in specific tasks above. Always refer to textbook hands-on machine
learning with Scikit-Learn, Keras & TensorFlow for coding help.
How do I submit?
1. Prepare Your Submission: Ensure your Jupyter notebook (.ipynb) is complete with all required work.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Structured Search For Big Data From Keywords To Key-objects

Authors: Mikhail Gilula

1st Edition

012804652X, 9780128046524

More Books

Students also viewed these Databases questions