Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

python Split each of the three datasets into training and testing subsets randomly by the ratio 80/20. a. On each dataset, train the decision tree

python

Split each of the three datasets into training and testing subsets randomly by the ratio 80/20. a. On each dataset, train the decision tree classifiers with entropy as the node selection criteria, what is the prediction accuracy of each classifier? What are the height and the number of leaves for each tree? b. For voting, what is the prediction accuracy of the classifier with gini as the node selection criteria? Which feature provides the highest gini value during the first node selection process? c. For spam, train the decision tree classifier with entropy as the node selection criteria but with different depths. Plot the accuracy as the depth of the tree is increased from 1 to 50 (the x- axis is the depth of the tree and y-axis is the accuracy of the model). What have you observed from the graph?

--------------------------------------------------------------------------------------------------------------------------------

main

#~/usr/bin/env python """ The main script for running experiments """ from data import get_dataset import numpy as np

def main(): dataset_directory = 'data' dataset = 'spam' #volcanoes #voting #spam schema, X, y = get_dataset(dataset, dataset_directory) print (len(X)) def _1_1_a(): pass

if __name__ == "__main__": main() _1_1_a()

----------------------------------------------------------------------------------------------------

data

""" Handle reading in the data and representing feature values """ import os import numpy as np from scipy.io import loadmat

class Schema(object):

def __init__(self, ids, feature_names, is_nominal, nominal_values): self.ids = ids self.feature_names = feature_names self._is_nominal = is_nominal self.nominal_values = nominal_values

def get_nominal_value(self, feature_index, value_index): if not self._is_nominal[feature_index]: raise ValueError('Feature %d is not nominal.' % feature_index)

return self.nominal_values[feature_index][value_index]

def is_nominal(self, feature_index): return self._is_nominal[feature_index]

def get_dataset(dataset_name, base_directory='.'): """ Loads a dataset with the given name. The associated `.mat` file must be in the directory `base_directory`. @param dataset_name : name of `.mat` file holding dataset @param base_directory : location of `.mat` file holding dataset @return (Schema, X, y) : X is a examples-by-features sized NumPy array, and y is a 1-D array of associated -1/+1 labels """ mat = loadmat(os.path.join(base_directory, dataset_name), appendmat=True, chars_as_strings=True, squeeze_me=True)

feature_names = [str(s) for s in mat['feature_names']] is_nominal = [bool(b) for b in mat['is_nominal']] nominal_values = [[str(s) for s in values] for values in mat['nominal_values']]

ids = [str(s) for s in mat['ids']] X = mat['examples'] y = mat['labels']

schema = Schema(ids, feature_names, is_nominal, nominal_values)

return schema, X, y

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

b. Why were these values considered important?

Answered: 1 week ago