Question

1 Approved Answer

Posted on Sep 05, 2024

A4,A5 and A6 Table 1: Data Description Field Description STU ID Student ID index). Honor Class Output Column. Honor class of the student at the

A4,A5 and A6

image text in transcribed

Table 1: Data Description Field Description STU ID Student ID index). Honor Class Output Column. Honor class of the student at the end of his/her undergrad- uate studies. It has values 1, 2 and 3 which corresponds to first class, second class, and third class respectively. High School Graduated honor class from high school, can be 'First', 'Second', or "Third'. TOEFL score TOEFL Score of student (out of 120) before joining the first year. Math-101 Grade obtained in the first year mathematics course (letter grade) in the first attempt. Phys-101 Grade obtained in the first year physics course (letter grade) in the first at- tempt. Chem-101 Grade obtained in the first year chemistry course (letter grade) in the first attempt. Engl-101 Grade obtained in the first year English course (letter grade) in the first at- tempt. Prog-101 Grade obtained in the first year programming course (letter grade) in the first attempt. Do the following tasks using data given in HW6DataA and Table-1: A-1: Given Data. Read and display the data. Check if there are any missing values. Check if there is any type inconsistency. Handle missing values and resolve inconsistencies (if any). A-2: Data Preparation. Update the dataframe as follows: drop/remove column 'STU ID'; encode all input columns into numeric type. A-3: Decision Tree Analysis. The hypothesis is that the input columns can be helpful in predicting the output column. Do the following: Split the given data into train and test data set, such that train data set contains 70% of the given data. Fit and display the decision tree classifier using train data set. Using the decision tree classifier, calculate and display the confusion matrix for the above test data. . Using the confusion matrix, calculate accuracy. A-4: Classification & Association Rules. Identify and write any three classification rules. Write an association rule between 'High School Math-101', which has the maximum accuracy. Write an association rule between 'TOEFL & High School Engl-101 & Phys-101', which has the maximum support. A-5: Random Forest Analysis. Same hypothesis from A-3. Do the following: Fit a random forest classifier using the train data set from Part A-3. Use 5 estimators. Display the above decision trees with max depth of 3. Using the random forest classifier, calculate and display the confusion matrix for the test data from Part A-3. Using the confusion matrix, calculate accuracy. A-6: Naive Bayes Classification. Same hypothesis from A-3. Do the following: . Fit a naive Bayes classifier using the train data set from Part A-3. Use Categorical NB. Using the classifier, display the confusion matrix for the above test data. . Using the confusion matrix, calculate accuracy. Note: Solve all the above questions using Python. Use Pandas, Seaborn, Sklearn, etc. libraries for all the above analysis Table 1: Data Description Field Description STU ID Student ID index). Honor Class Output Column. Honor class of the student at the end of his/her undergrad- uate studies. It has values 1, 2 and 3 which corresponds to first class, second class, and third class respectively. High School Graduated honor class from high school, can be 'First', 'Second', or "Third'. TOEFL score TOEFL Score of student (out of 120) before joining the first year. Math-101 Grade obtained in the first year mathematics course (letter grade) in the first attempt. Phys-101 Grade obtained in the first year physics course (letter grade) in the first at- tempt. Chem-101 Grade obtained in the first year chemistry course (letter grade) in the first attempt. Engl-101 Grade obtained in the first year English course (letter grade) in the first at- tempt. Prog-101 Grade obtained in the first year programming course (letter grade) in the first attempt. Do the following tasks using data given in HW6DataA and Table-1: A-1: Given Data. Read and display the data. Check if there are any missing values. Check if there is any type inconsistency. Handle missing values and resolve inconsistencies (if any). A-2: Data Preparation. Update the dataframe as follows: drop/remove column 'STU ID'; encode all input columns into numeric type. A-3: Decision Tree Analysis. The hypothesis is that the input columns can be helpful in predicting the output column. Do the following: Split the given data into train and test data set, such that train data set contains 70% of the given data. Fit and display the decision tree classifier using train data set. Using the decision tree classifier, calculate and display the confusion matrix for the above test data. . Using the confusion matrix, calculate accuracy. A-4: Classification & Association Rules. Identify and write any three classification rules. Write an association rule between 'High School Math-101', which has the maximum accuracy. Write an association rule between 'TOEFL & High School Engl-101 & Phys-101', which has the maximum support. A-5: Random Forest Analysis. Same hypothesis from A-3. Do the following: Fit a random forest classifier using the train data set from Part A-3. Use 5 estimators. Display the above decision trees with max depth of 3. Using the random forest classifier, calculate and display the confusion matrix for the test data from Part A-3. Using the confusion matrix, calculate accuracy. A-6: Naive Bayes Classification. Same hypothesis from A-3. Do the following: . Fit a naive Bayes classifier using the train data set from Part A-3. Use Categorical NB. Using the classifier, display the confusion matrix for the above test data. . Using the confusion matrix, calculate accuracy. Note: Solve all the above questions using Python. Use Pandas, Seaborn, Sklearn, etc. libraries for all the above analysis