Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Feb 29, 2024

Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 [10 points, Peer Review] We will use a pre-processed natural language dataset

Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 [10 points, Peer Review] We will use a pre-processed natural language dataset in the CSV file "spamdata.csv" to classify emails as spam or not. Each row contains the word frequency for 54 words plus statistics on the longest "run" of captial letters. Word frequency is given by: fi = m/N Where f; is the frequency for word i, m; is the number of times word i appears in the email, and N is the total number of words in the email. We will use decision trees to classify the emails. Part A [5 points]: Complete the function get_spam_dataset to read in values from the dataset and split the data into train and test sets. In [ ]: def get_spam_dataset (filepath="data/spamdata.csv", test_split-0.1): get_spam_dataset Loads csv file located at "filepath". Shuffles the data and splits it so that the you have (1-test_split)100% training examples and (test_split)100% testing examples. Args: filepath: location of the csv file test_split: percentage/100 of the data should be the testing split Returns: x_train, x_test, y_train, y_test, feature_names Note: feature_names is a list of all column names including isspam. (in that order) first four are np.ndarray # your code here return 0 In [ ]: # TO-DO: import the data set into five variables: x_train, x_test, y_train, y_test, Label_names # Uncomment and edit the Line below to complete this task. test_split = 0.1 # default test_split; change it if you'd Like; ensure that this variable is used as an argument to your functio # your code here # X_train, x_test, y_train, y_test, Label_names = np.arange (5) In [ ]: # tests X_train, x_test, y_train, y_test, and Label_names

Step by Step Solution

There are 3 Steps involved in it

Step: 1

In import numpy as np Load the spam dataset using the getspamdataset function def getsp... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Numerical Methods With Chemical Engineering Applications

Authors: Kevin D. Dorfman, Prodromos Daoutidis

1st Edition

1107135117, 978-1107135116

More Books

Students also viewed these Programming questions

Question

9 To predict the average speeds of vehicles on the interstates, the DOT plans to use a multiple regression model. They feel that the vehicle speed is determined by a variety of factors including the...

Answered: 1 week ago

Question

Q1. You have identified a market opportunity for home media players that would cater for older members of the population. Many older people have difficulty in understanding the operating principles...

Answered: 1 week ago

Question

CANMNMM January of this year. (a) Each item will be held in a record. Describe all the data structures that must refer to these records to implement the required functionality. Describe all the...

Answered: 1 week ago

Question

★★★★★

Argon gas enters a constant cross-sectional area duct at Ma1 = 0.2, P1 = 320 kPa, and T1 = 400 K at a rate of 1.2 kg/s. Disregarding frictional losses, determine the highest rate of heat transfer to...

Answered: 1 week ago

Question

★★★★★

Colson Company has a line of credit with Federal Bank. Colson can borrow up to $800,000 at any time over the course of the 2018 calendar year. The following table shows the prime rate expressed as an...

Answered: 1 week ago

Question

★★★★★

Refer to the Environmental & Engineering Geoscience (Nov. 2012) study of methods for predicting settlement of shallow foundations on cohesive soil, Exercise 7.50. Actual settlement values for a...

Answered: 1 week ago

Question

★★★★★

What are the formulas for (a) ????????????????????? (b) ????????????????????? (c) ????????????? (d) ????2?

Answered: 1 week ago

Question

★★★★★

Would Aimer likely be able to avoid reinstating Saldona under the key employee exception? Why or why not? Rick Saldona began working as a traveling salesperson for Aimer Winery in 1979. Sales...

Answered: 1 week ago

Question

★★★★★

Question 4. 4. (TCOs 3, 4, 5, and 7) Which, if any, of the following expenses are deductible as miscellaneous itemized deductions? (Points : 5) Union dues Appraisal fees to establish the amount of a...

Answered: 1 week ago

Question

★★★★★

It is September 2019 and you have just met with Ron Smith. Ron owns a battery-operated children's car manufacturing company, Car-It, and needs your advice on strategy. Ron is a self-made car builder....

Answered: 1 week ago

Question

★★★★★

Apply an appropriate hypothesis test to determine if the household expenditure of Singapore residents in 2020 is lower than that in 2019, at the significance level of 1%. Examine the assumptions...

Answered: 1 week ago

Question

★★★★★

B Remember to SAVE your work C D QUESTION 3-REVENUE VARIANCE ANALYSIS-Version C (25 marks) ABC Co. manufactures widgets and has the following operating Information for the past year: Product Line...

Answered: 1 week ago

Question

★★★★★

5. Use any method (including calculators!) to compute the following volumes. a) The region bounded by y = sin(x) and the lines y = 0, x = 0, and x = 2 rotated around the x-axis. b) The region bounded...

Answered: 1 week ago

Question

★★★★★

solve this problem by showing the manual calculation and choose the correct answer(mechanical vibrations) Find the natural frequencies of the system for k = 300 N/m, k2 = 500 N/m, k3 = 200 N/m, m = 2...

Answered: 1 week ago

Question

★★★★★

MegaHoldings Group, a significant conglomerate, and MiniFirm Ltd, its subsidiary, are involved in a financial transaction. Initially, on January 1, 2021, MegaHoldings Group issued bonds into the...

Answered: 1 week ago

Question

★★★★★

D Check out the figure below: 10. Annual rate of per capita GDP growth (%) 4 2 Z " Annual rate of population growth (%) What does the above graph imply about the relationship between income growth...

Answered: 1 week ago

Question

★★★★★

CH. 9 APPLYING EXCEL. Managerial accounting. Please help and label the answers with their corresponding letter, THANK YOU. X Exit full screen Requirement 2: Revise the data in your worksheet to...

Answered: 1 week ago

Question

★★★★★

A new car sold for $31,000. If the vehicle loses 15% of its value each year, how much will it be worth after 10 years?

Answered: 1 week ago

Question

★★★★★

What is the grid spacing x if you discretize the space x [1, 1] with 51 nodes?

Answered: 1 week ago

Question

★★★★★

Use separation of variables to obtain an eigenfunction solution of the form for the unsteady-diffusion equation subject to the constant concentration boundary condition c(0, t) = 0 and the reaction...

Answered: 1 week ago

Question

★★★★★

What are the criteria for the number of iterations such that Jacobis method is faster than Gauss elimination for solving a linear problem that is not banded?

Answered: 1 week ago

Question

★★★★★

1. Analyze how large the gaps are that exist in health indicators worldwide.

Answered: 1 week ago

Question

★★★★★

12.1 Describe issues concerning defining normality across cultures.

Answered: 1 week ago

Question

★★★★★

1. Create a list of the ecological-level variables that have been documented to be associated with health across countries.

Answered: 1 week ago

Previous Question Next Question