Question

1 Approved Answer

Posted on Sep 06, 2024

Load & check the data: 1. Load the MINST data into a pandas dataframe named MINST_firstname where first name is you name. 2. List the

Load & check the data: 1. Load the MINST data into a pandas dataframe named MINST_firstname where first name is you name.

2. List the keys

3. Assign the data to a ndarray named X_firstname where firstname is your first name.

4. Assign the target to a variable named y_firstname where firstname is your first name.

5. Print the types of X_firstname and y_firstname.

6. Print the shape of X_firstname and y_firstname.

7. Create three variables named: some_digit1, some_digit2, some_digit3. Store in these variables the values from X_firstname indexed 7,5,0 in order.

8. Use imshow method to plot the values of the three variables you defined in the above point. Note the values in your written response. Pre-process the data

9. Change the type of y to unit8

10. The current target values range from 0 to 9 i.e. 10 classes. Transform the target variable to 3 classes as follows:

a. Any digit between 0 and 3 inclusive should be assigned a target value of 0

b. Any digit between 4 and 6 inclusive should be assigned a target value of 1

c. Any digit between 7 and 9 inclusive should be assigned a target value of 9 (Hint: you can use numpy.where to carry out the transformation on the target.)

11. Print the frequencies of each of the three target classes and note it in your written report in addition provide a screenshot.

12. Split your data into train test. Assign the first 60,000 records for training and the last 10,000 records for testing. (Hint you dont need sklearn train test as the data is already randomized). Build Classification Models Nave Bayes

13. Train a Naive Bayes classifier using the training data. Name the classifier NB_clf_firstname.

14. Use the classifier to predict the three variables you defined in point 7 above. Note the results in your written response and compare against the actual results. 15. Use 3-fold cross validation against the train data and note the results in your written response.

16. Use the model to score the accuracy against the test data, note the result in your written response.

17. Generate the accuracy matrix. Logistic regression

18. Train a Logistic regression classifier using the same training data. Name the classifier LR_clf_firstname. (Note this is a multi-class problem make sure to check all the parameters and set multi_class='multinomial'). Set max_iter to 1000 and tolerance to 0.1 in both cases. Try training the classifier using two solvers first lbfgs then Saga. Make sure you note the results in both cases in your written response, and note the main differences in your written response with a written explanation.

19. Use the classifier that worked best from the above point to predict the three variables you defined in point 7 above. Note the results in your written response and compare against the actual results.

20. Use 3-fold cross validation against the training data and note the results in your written response.

21. Use the model to score the accuracy against the test data, note the result in your written response.

22. Generate the accuracy matrix

23. Generate the precision and recall of the model and note them in your written response. Finally, in your written response compare the results from both models i.e. (The Nave Bayes and the Logistic regression) and write your conclusions.