Question
Read carefully please, I need an expert for this. Here is the Task1: Here is what I have implemented: Code: # Write your helper functions
Read carefully please, I need an expert for this.
Here is the Task1: Here is what I have implemented: Code:
# Write your helper functions here, if needed
def mean(numbers):
return float(sum(numbers)) / max(len(numbers), 1)
def squared_error_loss(numbers, mean_value):
return sum((n-mean_value)**2 for n in numbers)
def split_data(df, feature, perf_values):
left_df = df[df[feature]==0]
left_perf = [perf_values[i] for i in left_df.index.tolist() if i
right_df = df[df[feature]==1]
right_perf = [perf_values[i] for i in right_df.index.tolist() if i
return left_df, left_perf, right_df, right_perf
def build_tree(df, feature_list, perf_values):
# if not feature_list or len(set(perf_values)) == 1:
# return None
current_mean = mean(perf_values)
best_feature = None
best_error = float("inf")
for feature in feature_list:
left_df, left_perf, right_df, right_perf = split_data(df, feature, perf_values)
left_mean = mean(left_perf)
right_mean = mean(right_perf)
left_error = squared_error_loss(left_perf, left_mean)
right_error = squared_error_loss(right_perf, right_mean)
error = left_error + right_error
if error
best_feature = feature
best_error = error
best_split = (left_df, left_perf, right_df, right_perf)
print(feature_list)
print(best_feature)
if len(feature_list) != 0:
feature_list.remove(best_feature)
left_df, left_perf, right_df, right_perf = best_split
left_tree = build_tree(left_df, feature_list, left_perf)
right_tree = build_tree(right_df, feature_list, right_perf)
return {"name":best_feature, "mean":current_mean, "split_by_feature": best_feature,
"error_of_split": best_error, "successor_left": left_tree, "successor_right": right_tree}
def get_cart(sample_set_csv):
# The sample_set_csv is a file path to a csv data, this can be imported into a dataframe
df = pd.read_csv(sample_set_csv)
# TODO: Write your code here. And change the return.
features = df.columns[1:-1].tolist()
performance_values = df['performance'].tolist()
mean_performance = mean(performance_values)
#recursive function to make the tree
return build_tree(df, features, performance_values)
# return { "name":"X", "mean":1234, "split_by_feature": "rar", "error_of_split":42,
# "successor_left":None,"successor_right":None} Here is the Test case that it should pass:
# Task 1
test_cart = {'name': 'X', 'mean': 763.2, 'split_by_feature': 'segmentation', 'error_of_split': 6.0,
'successor_left':
{'name': 'XL', 'mean': 772.0, 'split_by_feature': 'onegb', 'error_of_split': 0.0,
'successor_left':
{'name': 'XLL', 'mean': 770.0, 'split_by_feature': None, 'error_of_split': None,
'successor_left': None,
'successor_right': None
},
'successor_right':
{'name': 'XLR', 'mean': 773.0, 'split_by_feature': None, 'error_of_split': None,
'successor_left': None,
'successor_right': None
}
},
'successor_right':
{'name': 'XR', 'mean': 750.0, 'split_by_feature': None, 'error_of_split': None,
'successor_left': None,
'successor_right': None}
}
if get_cart("Performance_01.csv") == test_cart:
print("passed")
else:
print("failed") Here is the Performance_01.csv dataset values: 1 Id,secompress,encryption,aes,blowfish,algorithm,rar,zip,signature,timestamp,segmentation, 2 onehundredmb,onegb,performance 3 0,1,0,0,0,1,1,0,0,0,0,0,0,750 4 1,1,0,0,0,1,1,0,0,0,1,1,0,773 5 2,1,0,0,0,1,1,0,0,0,1,0,1,770 6 3,1,0,0,0,1,1,0,0,1,0,0,0,750 7 4,1,0,0,0,1,1,0,0,1,1,1,0,773 Kindly Resolve the issues in this code, such that, it passes this test case and other similar ones, you can tweak the code or give a better implementaion as well Waiting, it's required ASAP
In this assignment, we analyze a non-functional property, performance, of a fictional tool "SECompress", a configurable command-line tool for compressing data. In addition, the data can be encrypted, signed, segmented, and/or timestamped. All this functionality is modeled by the feature diagram in Figure 4 For all the tasks in this Assignment we will use this tool with the festures given in this diagram. \begin{tabular}{|l|} \hline Legend: \\ Abstract Feature \\ Concrete Feature \\ Mandatory \\ Optional \\ A Alternative Group \\ \hline \end{tabular} Figure 1: The Feature Diagram of "SECompress" Jupyter Notebook You have to implement your solution using the provided .Jupyter Notebook "Template Assignment 04.ipynb" that can be downlosded from the CMS. - This Jupyter Notebook contains comments where your implementation goes, marked with "TODO" - It contains example test cases. For the tests, you need to download the "* esv" files from the material section in the CMS snd save them in the same folder. - No additional imports are allowed, otherwise your submission is invalid and leads to a desk reject. - No additional libraries are allowed, otherwise your submission is invalid and leads to a desk reject. - No changing of signatures is allowed, otherwise your submission is invalid and leads to a desk reject. To use the Jupyter Notebook Template you need to install Juypter. You can use the install guideline https: //1upyter.org/1nstall of Jupyter, but on Windows we recommend using Anaconda to use .Jupyter https://docs. anaconda.com/ae-notebooks/user-pu1de/bas1c-tasks/apps/1upyter/1ndex.html. If you encounter any problems try googling them, there are lots of tutorials and forum discussions already online Otherwise you can use the SE forum, but plesse try to google it on your own first. CA RT Data Structure For this assignment sheet we use the following internal datastructure for a CART. We represent a CART as a python dict with exactly the entries as shown in the example below. This erarnple CAFT has three nodes. The root node " X ", and the two child nodes "XL" and "XR". As you can see the child nodes ouly have a name and a mean but all other fields are set to None. A parent node also has a name and a mean but additionally a feature by which the split is performed, the error of the split and two sucessors. It is important to nuse exaclty these names for the features: features - ["secompress", "encryption", "aes", "blowf1sh", "algorithm", "rar", "z1p", - "signature", "t1mestamp", "segmentation", "onehundrednb", "onegb"] Performance Data The performance data, given in a csv file, contains different configurations of SECompress with performance measurements, in the same format you saw in the lecture (Slide 26). Id, secompress, encryption, aes, blowfish, algorithm, rar, z1p, signature, tinestamp, segmentat1on, = onehundrednb, onegb, perf ormance a 0,1,0,0,0,1,1,0,0,0,0,0,0,750 i 1,1,0,0,0,1,1,0,0,0,1,1,0,773 = 2,1,0,0,0,1,1,0,0,0,1,0,1,770 \& 3,1,0,0,0,1,1,0,0,1,0,0,0,750 i 4,1,0,0,0,1,1,0,0,1,1,1,0,773 Task 1 [15 Points] a) Implement an algorithm that creates a CART from performance data as presented in the lecture. The CART must be implemented by yourself, you must not taku a prefabricated Python implementation. The format of the sample data and the representation of the CART are described below. If two split options are equally good, we use the alphabetic ordering of feature names as tie breaker. You can use the File from the CMS "Performance_01.esv" as an example test case. Then you should read this sample set csv file into a dataframe to work on this data. Now it is your turn, you must implement how we get a CART out of this dataframe, as presented in the lecture (Slides 26 to 28 )Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started