Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

PLEASE complete the code IN PYTHON (URGENT) decision tree should work for four cases: i) discrete features, discrete output ii) discrete features, real output; iii)

PLEASE complete the code IN PYTHON (URGENT)

decision tree should work for four cases: i) discrete features, discrete output

ii) discrete features, real output;

iii) real features, discrete output;

iv) real features, real output.

decision tree should be able to use GiniIndex or InformationGain as the criteria for splitting.code should also be able to plot/display the decision tree.

import numpy as np import pandas as pd import matplotlib.pyplot as plt from .utils import entropy, information_gain, gini_index

np.random.seed(42)

class DecisionTree(): def __init__(self, criterion, max_depth): """ Put all infromation to initialize your tree here. Inputs: > criterion : {"information_gain", "gini_index"} # criterion won't be used for regression > max_depth : The maximum depth the tree can grow to """ pass

def fit(self, X, y): """ Function to train and construct the decision tree Inputs: X: pd.DataFrame with rows as samples and columns as features (shape of X is N X P) where N is the number of samples and P is the number of columns. y: pd.Series with rows corresponding to output variable (shape of Y is N) """ pass

def predict(self, X): """ Funtion to run the decision tree on a data point Input: X: pd.DataFrame with rows as samples and columns as features Output: y: pd.Series with rows corresponding to output variable. THe output variable in a row is the prediction for sample in corresponding row in X. """ pass

def plot(self): """ Function to plot the tree Output Example: ?(X1 > 4) Y: ?(X2 > 7) Y: Class A N: Class B N: Class C Where Y => Yes and N => No """ pass

util - which need to be completed tooo

def entropy(Y): """ Function to calculate the entropy Inputs: > Y: pd.Series of Labels Outpus: > Returns the entropy as a float """ pass

def gini_index(Y): """ Function to calculate the gini index Inputs: > Y: pd.Series of Labels Outpus: > Returns the gini index as a float """ pass

def information_gain(Y, attr): """ Function to calculate the information gain Inputs: > Y: pd.Series of Labels > attr: pd.Series of attribute at which the gain should be calculated Outputs: > Return the information gain as a float """ pass

"""
The current code given is for the Assignment 1.
You will be expected to use this to make trees for:
> discrete input, discrete output
> real input, real output
> real input, discrete output
> discrete input, real output
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tree.base import DecisionTree
from metrics import *
np.random.seed(42)
# Test case 1
# Real Input and Real Output
N = 30
P = 5
X = pd.DataFrame(np.random.randn(N, P))
y = pd.Series(np.random.randn(N))
for criteria in ['information_gain', 'gini_index']:
tree = DecisionTree(criterion=criteria) #Split based on Inf. Gain
tree.fit(X, y)
y_hat = tree.predict(X)
tree.plot()
print('Criteria :', criteria)
print('RMSE: ', rmse(y_hat, y))
print('MAE: ', mae(y_hat, y))

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

DATABASE Administrator Make A Difference

Authors: Mohciine Elmourabit

1st Edition

B0CGM7XG75, 978-1722657802

More Books

Students also viewed these Databases questions