Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Fundamentals of Data Science Assignment 1 Objective In this assignment, you will implement a predictive modeling approach based on the decision tree. Detailed Requirement We
Fundamentals of Data Science Assignment 1 Objective In this assignment, you will implement a predictive modeling approach based on the decision tree. Detailed Requirement We have introduced a predictive modeling approach based on the decision tree in the class. In this assignment, you will implement and evaluate this approach on the Vertebral Column dataset from the UCI Machine Learning Repository: https://archive.ics.uci.edu. You should partition the dataset into two subsets: one for training and the other for evaluation. The partitioning should be performed in such a way that the proportions of data records belonging the different classes in the training set and test set should be similar to those of the original dataset. Please note that there are two versions of the Vertebral Column dataset. Please use the version in which the orthopedic patients are categorized into three classes (disk hernia (DH), spondylolisthesis (SL) or normal (NO)). You can implement a decision tree model using the Python package scikit-learn, and visualize the model by installing the package python-graphviz. . You may refer to the following references for more details about Python and its packages. Data mining tutorials using Python (https://www.cse.msu.edu/~ptan/dmbook/software) Scikit-learn website (https://scikit-learn.org) . Assignment Submission You should submit a report to summarize your work. The following tasks are to be performed: a. Construct multiple decision trees based on different partitions of the dataset into a training set and a test set. You should clearly specify which impurity measure you have used for tree construction, and the parameters you have selected. (25%) b. Compare the structures and classification performances of these different trees. (25%) c. For selected trees, observe the classification performance associated with the different classes, and determine which pair(s) of classes are likely to be confused with each other. (25%) d. For selected confused class pairs in c., identify the corresponding leaf node(s) and analyze the sequence of decisions that lead to the misclassification. (25%) Please provide a detailed description of the results of the above tasks in your report
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started