Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 29, 2024

Data Acquisition and Initial Analysis: Retrieve the MNIST dataset. Perform exploratory data analysis to understand the dataset's structure, including i . how many images ii

Data Acquisition and Initial Analysis:

Retrieve the MNIST dataset.

Perform exploratory data analysis to understand the dataset's structure, including

.

how many images

.

how many features and the range of feature values

(

.

.,

histogram of the data value

),

relating it to real

-

world, such as real images.

iii. how many categories

/

labels

(

discrete or continuous type

)

and what they are?

.

visualize at least three randomly selected samples within each category

(

feel the variance

of the data

)

.

visualize more data samples to see whether there are bad data samples need to be

removed. What bad data samples do you think can be

?

2 .

Data Preparation and Manipulation:

Apply dimensionality reduction techniques

(

PCA and t

-

SNE

)

to the MNIST dataset and visualize the

results.

Split the dataset into training

(60, 000

samples

)

and testing

(10, 000

samples

)

sets.

3 .

Machine Learning Model Implementation:

Train a Random Forest classifier on the original dataset and record its performance.

Use PCA to reduce the dataset

s dimensionality to

174 .

Train a new Random Forest classifier on the

reduced dataset and see how long it takes. Was training much faster? Then, evaluate the classifier on

the test set. How does it compare to the previous classifier?

4 .

Critical Evaluation and Conclusion:

Provide a comprehensive evaluation of the performance of the models.

Summarize findings and insights.

5 .

Research Question: Explore how various image preprocessing methods

(

.

.,

normalization, binarization,

noise reduction, and image augmentation

)

influence the performance of at least two different machine

learning models

(

.

.,

Convolutional Neural Networks and Random Forest classifiers

)

trained on the MNIST

dataset. Analyze the models' accuracy, training time, and ability to generalize to test data. Discuss your

findings' implications for designing machine learning pipelines in digit recognition tasks.

6 .

Reflect on the composition and diversity of the MNIST dataset, considering its impact on the training process

and model performance. Explore how the inclusion of a more diverse set of handwriting samples

(

.

.,

different handwriting styles, inclusion of characters from non

-

Latin alphabets, or samples from wider age

3

groups

)

might affect the accuracy and generalizability of machine learning models trained for digit

recognition tasks.

Structure

Prepare a jupyter notebook for this assignment. The structure of the Jupyter notebook should alternate texts and

python codes and cover topics listed the in specific tasks above. Always refer to textbook

hands

-

on machine

learning with Scikit

-

Learn, Keras & TensorFlow

for coding help.

How do I submit?

1 .

Prepare Your Submission: Ensure your Jupyter notebook

(.

ipynb

)

is complete with all required work.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Structured Search For Big Data From Keywords To Key-objects

Authors: Mikhail Gilula

1st Edition

012804652X, 9780128046524

More Books

Students also viewed these Databases questions

Question

★★★★★

Lori produces Final Exam Care Packages for resale by her sorority. She is currently working a total of 5 hours per day to produce 100 care packages. a. What is Loris productivity? Lori thinks that by...

Answered: 1 week ago

Question

★★★★★

=+3-42. Consider a space heating system designed as shown in Fig. 3-21. The total space beating load is 500,000 Btu/hr (145 kW), and the space design conditions are 70 F (21 C) and 30 percent...

Answered: 1 week ago

Question

★★★★★

Among the drivers insured with an insurance company, 45% made no claims during a year, 35% made one claim, and 20% made at least two claims. The probabilities that a driver will make more than one...

Answered: 1 week ago

Question

★★★★★

Cerulean, Inc., Coral, Inc., and Crimson, Inc., form the Three Cs Partnership on January 1 of the current year. Cerulean is a 50% partner, and Crimson and Coral are 25% partners. For reporting...

Answered: 1 week ago

Question

★★★★★

Tech Systems manufactures an optical switch that it uses in its final product. Tech Systems incurred the following manufacturing costs when it produced 71,000 units last year: (Click the icon to view...

Answered: 1 week ago

Question

★★★★★

Required information Use the following information for the Exercises below. [The following information applies to the questions displayed below.) The Fields Company has two manufacturing departments,...

Answered: 1 week ago

Question

★★★★★

Finally, 21st Century is also considering Project Z Project Z has an up-front after-tax cost of $500,000, and it is expected to produce after-tax cash flows of $100,000 at the end of each of the next...

Answered: 1 week ago

Question

★★★★★

2. A liquid is heated in a lab using a hot plate. The graph of temperature vs. time is as follows. Temperature of Solution Temperature (C) 888R 8 8 8 8 8 9 110 200 10 20 40 50 60 70 90 Time (min)...

Answered: 1 week ago

Question

★★★★★

3. Develop a database program that creates a database named cities.db. The cities.db database must have a table named Cities, with the following columns: Column Name Data Type I Attributes City Name...

Answered: 1 week ago

Question

★★★★★

Prior to the Holiday season, Toys-R-Us would increase their on-hand inventory levels. What is the rational for this (see chapter 9)? What are some of the reasons why Toys-R-Us would not purchase more...

Answered: 1 week ago

Question

★★★★★

Laker Company reported the following January purchases and sales data for its only product. For specific identification, ending inventory consists of 275 units from the January 30 purchase, 5 units...

Answered: 1 week ago

Question

★★★★★

Explain the importance of creating an ethical culture and code of ethics.

Answered: 1 week ago

Question

★★★★★

Describe the importance of employer branding.

Answered: 1 week ago

Question

★★★★★

Explain corporate sustainability.

Answered: 1 week ago

Previous Question Next Question