Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Decision Trees ( DTs ) are a non - parametric supervised learning method used for classification and regression. The goal is to create a model

Decision Trees (DTs) are a non-parametric supervised learning method used for
classification and regression. The goal is to create a model that predicts the value of a
target variable by learning simple decision rules inferred from the data features. A tree
can be obtained after training. In practice, a predictive decision tree model will
incrementally select the best decisions to split on (evaluated based on the entropy
principle) to provide an output classification based on our input data. For this
assessment, you will be describing a new problem and utilising some machine learning
Python modules to create an ID3 predictive decision tree model, along with a
visualisation to better understand the classification process.
This Task 3 will measure your ability to 1) study a problem that can be tackled by an
artificial intelligence method, e.g., a decision tree; 2) implement the decision tree using
the given instructions; 3) visualize and analyse the decision tree. The objective of this
assessment is to utilise the pandas the scikit-learn library to implement the ID3
decision tree machine learning algorithm to create a classifier that can tackle a specific
problem. By the end of this assessment, you should have a better understanding of how
decision trees work. Utilising the trained tree visualisation, you should have a reinforced
understanding of the core decision tree principles and how the trees split evaluations
operate.
Problem Description
You will be creating a decision tree that will predict wine classes based on provided attributes.
Imagine that you are a wine producer compiling data for a study. The data is the results of a
chemical analysis of wines grown in the same region in Italy by three different cultivators. There
are thirteen different measurements taken for different constituents found in the three types of
wine, class_0, class_1, and class_2.
Those thirteen different measurements include: Alcohol, Malic acid, Ash, Alcalinity of ash,
Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color
intensity, Hue, OD280/OD315 of diluted wines and Proline.
Therefore, the models input parameters can be Those thirteen different measurements and the
model output should be the wine classes.
Implementation instructions
Assignment dependency installation
You will have to install the following required dependency:
scikit-learn
pandas
matplotlib
Imports
You will be utilising a number of well-known machine learning Python modules in this
task. These steps allow for ease of implementation of our decision tree and include
numerous learning tools to help boost your understanding.
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
Load the dataset and Format the training/testing data
You will be using the following codes to load the dataset:
# Load wine dataset
from sklearn.datasets import load_wine
data = load_wine()
The pandas python module is a very powerful, frequently used data analysis tool in all
forms of machine learning; it allows you to store and manipulate large datasets very
easily and is highly compatible/integrated with other machine learning tools/modules.
You will be storing your dataset into a pandas DataFrame. A DataFrame is very similar to
a dictionary in standard Python but has many additional useful features. Add our data
into this DataFrame by specifying the data keys and corresponding values.
Then, what we need to do is to create a training set for training the classifier and a test
set to evaluate the quality of the trained classifier. Creating the training and test sets
can be quite easy for this task, we can simply split the collected data into two groups.
For example, if we have collected 100 records, we can use 80 records as the training
set, leaving the remaining 20 records as the test set.
Train the decision tree
Once you have correctly formatted your data, you can move on to creating the decision
tree. Create a new scikit-learn DecisionTreeClassifier, pass the entropy key as the
criterion for the information gain. Scikit-learn is a powerful machine learning
framework. You will be utilising the included DecisionTreeClassifier class to create
and train your decision tree

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional SQL Server 2012 Internals And Troubleshooting

Authors: Christian Bolton, Justin Langford

1st Edition

1118177657, 9781118177655

More Books

Students also viewed these Databases questions