Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The classic MNIST Digit RecognizerLinks to an external site. problem is a competition on Kaggle.com, and you will compete in this competition. For this assignment,

The classic MNIST Digit RecognizerLinks to an external site. problem is a competition on Kaggle.com, and you will compete in this competition. For this assignment, you will develop a classifier that may be used to predict which of the 10 digits is being written.
Management/Research Question
In laymans terms, what is the management/research question of interest, and why would anyone care?
Requirements
Fit a random forest classifier using the full set of explanatory variables and the model training set (csv).
Record the time it takes to fit the model and then evaluate the model on the csvdata by submitting to Kaggle.com. Provide your Kaggle.com score and user ID.
Execute principal components analysis (PCA) on the combined training and test set data together, generating principal components that represent 95 percent of the variability in the explanatory variables. The number of principal components in the solution should be substantially fewer than the explanatory variables.
Record the time it takes to identify the principal components.
Using the identified principal components from step (2), use thecsvto build another random forest classifier.
Record the time it takes to fit the model and to evaluate the model on the csvdata by submitting to Kaggle.com. Provide your Kaggle.com score and user ID.
Use k-means clustering to group MNIST observations into 1 of 10 categories and then assign labels. (Follow the example here if needed: kmeans mnist.pdf Download kmeans mnist.pdf).kmeans mnist-2.pdf Download kmeans mnist-2.pdf
Submit the RF Classifier, the PCA RF, and k-means estimations to Kaggle.com, and provide screen snapshots of your scores as well as your Kaggle.com user name.
The experiment we have proposed has a major design flaw. Identify the flaw. Fix it. Rerun the experiment in a way that is consistent with a training-and-test regimen, and submit this to Kaggle.com.
Report total elapsed time measures for the training set analysis. It is sufficient to run a single time-elapsed test for this assignment. In practice, we might consider the possibility of repeated executions of the relevant portions of the programs, much as the Benchmark Example programs do. Some code that might help you with reporting elapsed total time follows.
start=datetime.now()
rf2.fit(trainimages,labels)
end=datetime.now()
print(end-start)
Python Programming
All programming will be done in Python.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions