Answered step by step
Verified Expert Solution
Question
1 Approved Answer
import numpy as np from collections import Counter from sklearn import datasets, model _ selection # No other libraries will be imported # load the
import numpy as np
from collections import Counter
from sklearn import datasets, modelselection
# No other libraries will be imported
# load the Iris Dataset, which contains samples.
# each sample has features.
# the dataset contains classes of instances each, where each class refers to a type of iris plant.
iris datasets.loadiris
X nparrayirisdata # features, numeric attributes. Sepal length, Sepal Width, Petal length, Petal width
Y nparrayiristarget # labels: class class class
Xtrain, Xtest, Ytrain, Ytest modelselection.traintestsplitX Y testsize randomstate
printTrain Shape:", Xtrain.shape
printTrain Shape:", Xtest.shape
Calculate Information Gain for each attribute numeric and show the feature that should be used first when build a decision tree.
step: find the best cutpoint for each attribute. find value to split the data
step: calculate the information gain for each attribute. decide the order of attributes when build DT
# Some helper functions
# calculate Entropy for a given distribution HX
def entropyprobabilities: list float:
return sump nplogp for p in probabilities if p
# given a list of labels, return the probability for each class PY
def classprobabilitieslabels: list list:
totalcount lenlabels
return labelcount totalcount for labelcount in Counterlabelsvalues
# calculate the Entropy HY for a given list of labels.
def dataentropylabels: list float:
return entropyclassprobabilitieslabels
# split data into two subgroups group goup based on attribute featureidx and value featureval
# if samplefeatureidx featureval:
# group sample
# else:
# group sample
def splitdatadata: nparray, featureidx: int, featureval: float tuple:
maskbelowthreshold data: featureidx featureval
group datamaskbelowthreshold
group data~maskbelowthreshold
return group group
# calculate the entropy for current partition. HYXfeatureval
def partitionentropyglabels: list, glabels:list float:
totalcount lenglabels lenglabels
#weighted combination of conditional entropy in both group and group
return dataentropyglabelslenglabelstotalcount dataentropyglabelslenglabelstotalcount
#
# Examples to use the Helper functions
# calculate the HY for the train and test data:
printdataentropyYtrain
printdataentropyYtest
## to split the data based on featureidx and featureval:
traindata npconcatenateXtrain, npreshapeYtrain, axis # concatenate Xtrain, Ytrain
printtraindata.shape
# split the data into two subgroups
g g splitdatatraindata, featureidx featureval
printgshape
printgshape
# calculate the weighted entropy for the current split.
printpartitionentropyg: g:
#
# Your implementation
# Initialize variables to store the best cutpoint and information gain for each attribute
#
# Printing
#print the calculated cutpoint featureval and information gain for each attribute.
# print the feature should be used first when build decision tree.
Please help me complete this code
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started