Question

1 Approved Answer

Posted on Oct 16, 2024

ID3 decision tree In this part, you will implement the ID3 decision tree learning algorithm using Java orPython . You cannot use any package orlibrary

ID3 decision tree

In this part, you will implement the ID3 decision tree learning algorithm using Java orPython. You cannot use any package orlibrary for this assignment(numpy and scipy is ok).

To simplify things, you can assume that the data used to test your implementation will containonly Boolean (0 or 1) attributes and Boolean (0 or 1) class values.no missing data or attributes. You can also assume that the first row of the dataset willcontain column names and each non-blank line after that will contain a new data instance.

Within these constraints, your program should be able to read and process any datasetcontaining any number of attributes. You can assume that the last column would contain theclass labels.

Below is a summary of the requirements:

Build a binary decision tree classifier using the ID3 algorithm

Your program should read three arguments from the command line - complete path of thetraining dataset, complete path of the test dataset, and the pruning factor (explained later).

The datasets can contain any number of Boolean attributes and one Boolean class label. Theclass label will always be the last column.

The first row will define column names and every subsequent non-blank line will contain adata instance. If there is a blank line, your program should skip it.

Printing Decision Tree:

Your program should contain a print method that should output the current tree to thescreen. It should be in the following format:

wesly = 0 :

| honor = 0:

| | barclay = 0 : 1

| |barclay = 1 : 0

| honor = 1:

| | tea = 0: 0

| | tea = 1: 1

wesley = 1: 0

Printing Summary and Results:

After reading all the data instances, you should output a summary of the datasets, andcompute the pre-pruned accuracy on the training data and also accuracy of the model onthe test dataset and output them to the screen. You should also output the plot of thedecision tree model. For example,

Pre-Pruned Accuracy

-------------------------------------

Number of training instances = 100

Number of training attributes = 5

Total number of nodes in the tree = 20

Number of leaf nodes in the tree = 8

Accuracy of the model on the training dataset = 81.2%

Number of testing instances = 20

Number of testing attributes = 5

Accuracy of the model on the testing dataset = 60.8%

Pruning of the Decision Tree (Most important)

After the decision tree has been constructed, you will check thepruning factor, which willbe given by the third argument to your program. The pruning factor is defined as thefraction of the nodes that you will prune. For example, if you have 20 nodes in your tree andthe pruning factor is 0.2, you will prune 0.2 * 20 = 4 nodes randomly from your tree. Afterpruning the tree, you will re-compute the training and test accuracy and output thesummary on the screen as before. You should also output the plot of the post-pruneddecision tree model.

Post-Pruned Accuracy

-------------------------------------

Number of training instances = 100

Number of training attributes = 5

Total number of nodes in the tree = 16

Number of leaf nodes in the tree = 6

Accuracy of the model on the training dataset = 90.2%

Number of testing instances = 20

Number of training attributes = 5

Accuracy of the model on the training dataset = 72.4%

- README file indicating which language you used, how to compile and run your code.

- source code (no executables)

- A brief report indicating any assumptions that you made, what you accomplished, andwhat you learned. The report should clearly indicate the names of the team members.