Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 08, 2024

Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 We will use a pre-processed natural language dataset in the CSV file spamdata.csv

Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22

We will use a pre-processed natural language dataset in the CSV file "spamdata.csv" to classify emails as spam or not. Each row contains the word frequency for 54 words plus statistics on the longest "run" of captial letters.

Word frequency is given by: =/

Where is the frequency for word , is the number of times word appears in the email, and is the total number of words in the email. We will use decision trees to classify the emails.

TO DO 1: Complete the function get_spam_dataset to read in values from the dataset and split the data into train and test sets.

def get_spam_dataset(filepath="data/spamdata.csv", test_split=0.1): ''' get_spam_dataset Loads csv file located at "filepath". Shuffles the data and splits it so that the you have (1-test_split)*100% training examples and (test_split)*100% testing examples. Args: filepath: location of the csv file test_split: percentage/100 of the data should be the testing split Returns: X_train, X_test, y_train, y_test, feature_names (in that order) first four are np.ndarray ''' # complete your code here return 0

TO DO 2: Import the data set into five variables: X_train, X_test, y_train, y_test, label_names # Uncomment and edit the line below to complete this task.

test_split = 0.1 # default test_split; change it if you'd like; ensure that this variable is used as an argument to your function # your code here

# X_train, X_test, y_train, y_test, label_names = np.arange(5)

TO DO 3: Build a decision tree classifier using the sklearn toolbox. Then compute metrics for performance like precision and recall. This is a binary classification problem, therefore we can label all points as either positive (SPAM) or negative (NOT SPAM).

def build_dt(data_X, data_y, max_depth = None, max_leaf_nodes =None): ''' This function builds the decision tree classifier and fits it to the provided data. Arguments data_X - a np.ndarray data_y - np.ndarray max_depth - None if unrestricted, otherwise an integer for the maximum depth the tree can reach. Returns: A trained DecisionTreeClassifier ''' # complete your code here

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Securing SQL Server Protecting Your Database From Attackers

Securing SQL Server Protecting Your Database From Attackers

Authors: Denny Cherry

2nd Edition

1597499471, 978-1597499477

More Books

Students also viewed these Databases questions

Question

★★★★★

Shown here are annual financial data at December 31, 2017, taken from two different companies. Required 1. Compute the cost of goods sold section of the income statement at December 31, 2017, for...

Answered: 1 week ago

Question

★★★★★

How could a manager protect himself against the risk of making a decision that might later have negative implications? Newtown, South Africa, is a suburb of Johannesburg that boasts a rich cultural...

Answered: 1 week ago

Question

★★★★★

3. Kens therapy did not involve medication. If you were his therapist, would you have recommended medication for him? Why do you think his therapist did not?

Answered: 1 week ago

Question

★★★★★

Suppose you hold a diversified portfolio consisting of a $7,500 investment in each of 20 different common stocks. The portfolio beta is equal to 1.12. Now, suppose you have decided to sell one of the...

Answered: 1 week ago

Question

★★★★★

The following facts pertain to a non-cancelable lease agreement between Oriole Leasing and Sage Hill Group, a lessee (amounts in thousands). The collectibility of the lease payments by Oriole is...

Answered: 1 week ago

Question

★★★★★

submit the answer and then watch the video feedback. A graph of three curves plots the relationship between price P on the vertical axis and quantity on the horizontal axis. An upward sloping...

Answered: 1 week ago

Question

★★★★★

Summarize the discussion with your field instructor by addressing the following: Propose the new policy that, if implemented, might more comprehensively meet the needs of the clients that come to the...

Answered: 1 week ago

Question

★★★★★

QUESTION 2 Fattah is charged RM15,000 for his medical eye treatment that he received from Southern Medical Centre. He intended to withdraw from his EPF Account 2 to pay for the medical expenses. His...

Answered: 1 week ago

Question

★★★★★

7. Using the stress strain curve below, determine the following (21) Property Young's Modulus (4) Value Yield Strength (3) Ultimate Tensile Strength (2) Total Strain at Failure (2) Plastic Strain at...

Answered: 1 week ago

Question

★★★★★

Janie is a seventh grader who has taken al poll on how much time each student in her class spends watching television. She would like to input the information and subsequently create a chart. What...

Answered: 1 week ago

Question

★★★★★

Assignment 1 (10%) This assignment relates to the following Course Learning Requirements: CLR 1: Discuss the development of management and the challenges, skills, and attributes of managers CLR 3:...

Answered: 1 week ago

Question

★★★★★

Teamwork. Ethics. Form groups with four or five students in each. Each member of the group will choose a specific cultural dimension from among the following: ethnicity, race, gender, age, or...

Answered: 1 week ago

Question

★★★★★

Teamwork. Form groups of four to seven people. Discuss the importance of diversity initiatives in businesses. Plan an agenda for a seminar that could help people in a business understand the needs...

Answered: 1 week ago

Question

★★★★★

Technology. Global. Ethics. Interview a student, a businessperson, or a visiting lecturer who is a native of another country or who has spent extensive time in a particular country other than the...

Answered: 1 week ago

Previous Question Next Question