Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 12, 2024

Read carefully please, I need an expert for this. Here is the Task1: Here is what I have implemented: Code: # Write your helper functions

Read carefully please, I need an expert for this.

image text in transcribed

Here is the Task1: image text in transcribed Here is what I have implemented: Code:

# Write your helper functions here, if needed

def mean(numbers):

return float(sum(numbers)) / max(len(numbers), 1)

def squared_error_loss(numbers, mean_value):

return sum((n-mean_value)**2 for n in numbers)

def split_data(df, feature, perf_values):

left_df = df[df[feature]==0]

left_perf = [perf_values[i] for i in left_df.index.tolist() if i

right_df = df[df[feature]==1]

right_perf = [perf_values[i] for i in right_df.index.tolist() if i

return left_df, left_perf, right_df, right_perf

def build_tree(df, feature_list, perf_values):

# if not feature_list or len(set(perf_values)) == 1:

# return None

current_mean = mean(perf_values)

best_feature = None

best_error = float("inf")

for feature in feature_list:

left_df, left_perf, right_df, right_perf = split_data(df, feature, perf_values)

left_mean = mean(left_perf)

right_mean = mean(right_perf)

left_error = squared_error_loss(left_perf, left_mean)

right_error = squared_error_loss(right_perf, right_mean)

error = left_error + right_error

if error

best_feature = feature

best_error = error

best_split = (left_df, left_perf, right_df, right_perf)

print(feature_list)

print(best_feature)

if len(feature_list) != 0:

feature_list.remove(best_feature)

left_df, left_perf, right_df, right_perf = best_split

left_tree = build_tree(left_df, feature_list, left_perf)

right_tree = build_tree(right_df, feature_list, right_perf)

return {"name":best_feature, "mean":current_mean, "split_by_feature": best_feature,

"error_of_split": best_error, "successor_left": left_tree, "successor_right": right_tree}

def get_cart(sample_set_csv):

# The sample_set_csv is a file path to a csv data, this can be imported into a dataframe

df = pd.read_csv(sample_set_csv)

# TODO: Write your code here. And change the return.

features = df.columns[1:-1].tolist()

performance_values = df['performance'].tolist()

mean_performance = mean(performance_values)

#recursive function to make the tree

return build_tree(df, features, performance_values)

# return { "name":"X", "mean":1234, "split_by_feature": "rar", "error_of_split":42,

# "successor_left":None,"successor_right":None} Here is the Test case that it should pass:

# Task 1

test_cart = {'name': 'X', 'mean': 763.2, 'split_by_feature': 'segmentation', 'error_of_split': 6.0,

'successor_left':

{'name': 'XL', 'mean': 772.0, 'split_by_feature': 'onegb', 'error_of_split': 0.0,

'successor_left':

{'name': 'XLL', 'mean': 770.0, 'split_by_feature': None, 'error_of_split': None,

'successor_left': None,

'successor_right': None

'successor_right':

{'name': 'XLR', 'mean': 773.0, 'split_by_feature': None, 'error_of_split': None,

'successor_left': None,

'successor_right': None

}

'successor_right':

{'name': 'XR', 'mean': 750.0, 'split_by_feature': None, 'error_of_split': None,

'successor_left': None,

'successor_right': None}

}

if get_cart("Performance_01.csv") == test_cart:

print("passed")

else:

print("failed") Here is the Performance_01.csv dataset values: 1 Id,secompress,encryption,aes,blowfish,algorithm,rar,zip,signature,timestamp,segmentation, 2 onehundredmb,onegb,performance 3 0,1,0,0,0,1,1,0,0,0,0,0,0,750 4 1,1,0,0,0,1,1,0,0,0,1,1,0,773 5 2,1,0,0,0,1,1,0,0,0,1,0,1,770 6 3,1,0,0,0,1,1,0,0,1,0,0,0,750 7 4,1,0,0,0,1,1,0,0,1,1,1,0,773 Kindly Resolve the issues in this code, such that, it passes this test case and other similar ones, you can tweak the code or give a better implementaion as well Waiting, it's required ASAP

In this assignment, we analyze a non-functional property, performance, of a fictional tool "SECompress", a configurable command-line tool for compressing data. In addition, the data can be encrypted, signed, segmented, and/or timestamped. All this functionality is modeled by the feature diagram in Figure 4 For all the tasks in this Assignment we will use this tool with the festures given in this diagram. \begin{tabular}{|l|} \hline Legend: \\ Abstract Feature \\ Concrete Feature \\ Mandatory \\ Optional \\ A Alternative Group \\ \hline \end{tabular} Figure 1: The Feature Diagram of "SECompress" Jupyter Notebook You have to implement your solution using the provided .Jupyter Notebook "Template Assignment 04.ipynb" that can be downlosded from the CMS. - This Jupyter Notebook contains comments where your implementation goes, marked with "TODO" - It contains example test cases. For the tests, you need to download the "* esv" files from the material section in the CMS snd save them in the same folder. - No additional imports are allowed, otherwise your submission is invalid and leads to a desk reject. - No additional libraries are allowed, otherwise your submission is invalid and leads to a desk reject. - No changing of signatures is allowed, otherwise your submission is invalid and leads to a desk reject. To use the Jupyter Notebook Template you need to install Juypter. You can use the install guideline https: //1upyter.org/1nstall of Jupyter, but on Windows we recommend using Anaconda to use .Jupyter https://docs. anaconda.com/ae-notebooks/user-pu1de/bas1c-tasks/apps/1upyter/1ndex.html. If you encounter any problems try googling them, there are lots of tutorials and forum discussions already online Otherwise you can use the SE forum, but plesse try to google it on your own first. CA RT Data Structure For this assignment sheet we use the following internal datastructure for a CART. We represent a CART as a python dict with exactly the entries as shown in the example below. This erarnple CAFT has three nodes. The root node " X ", and the two child nodes "XL" and "XR". As you can see the child nodes ouly have a name and a mean but all other fields are set to None. A parent node also has a name and a mean but additionally a feature by which the split is performed, the error of the split and two sucessors. It is important to nuse exaclty these names for the features: features - ["secompress", "encryption", "aes", "blowf1sh", "algorithm", "rar", "z1p", - "signature", "t1mestamp", "segmentation", "onehundrednb", "onegb"] Performance Data The performance data, given in a csv file, contains different configurations of SECompress with performance measurements, in the same format you saw in the lecture (Slide 26). Id, secompress, encryption, aes, blowfish, algorithm, rar, z1p, signature, tinestamp, segmentat1on, = onehundrednb, onegb, perf ormance a 0,1,0,0,0,1,1,0,0,0,0,0,0,750 i 1,1,0,0,0,1,1,0,0,0,1,1,0,773 = 2,1,0,0,0,1,1,0,0,0,1,0,1,770 \& 3,1,0,0,0,1,1,0,0,1,0,0,0,750 i 4,1,0,0,0,1,1,0,0,1,1,1,0,773 Task 1 [15 Points] a) Implement an algorithm that creates a CART from performance data as presented in the lecture. The CART must be implemented by yourself, you must not taku a prefabricated Python implementation. The format of the sample data and the representation of the CART are described below. If two split options are equally good, we use the alphabetic ordering of feature names as tie breaker. You can use the File from the CMS "Performance_01.esv" as an example test case. Then you should read this sample set csv file into a dataframe to work on this data. Now it is your turn, you must implement how we get a CART out of this dataframe, as presented in the lecture (Slides 26 to 28 )