Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Feb 25, 2024

Create a tokenizer Write a function tokenize that takes a string and returns a list of tokens. [ ] deftokenize(doc): return Calculate token scores Calculate

Create a tokenizer

Write a function tokenize that takes a string and returns a list of tokens.

[ ] deftokenize(doc): return

Calculate token scores

Calculate scores for every token in the corpus, using the method discussed in class. Store these scores in a dictionary called token_scores.

[ ] token_scores={}

Create a score message function

Write a function score_message that takes an SMS message doc and returns a SPAM score, using the method discussed in class.

[ ] defscore_message(doc):

return

What tokens are most predictive of a message being SPAM? (coding)
What tokens are most predictive of a message being HAM? (coding)
How many documents are misclassified by the model?(coding)

import urllib.request, json

sms_corpus = []

with urllib.request.urlopen("https://storage.googleapis.com/wd13/SMSSpamCollection.txt") as url:

for line in url.readlines():

sms_corpus.append(line.decode().split('t'))

# print the text and label of document 10

docid = 16

print(sms_corpus[docid])

# print the label of document 10

docid = 16

print(sms_corpus[docid][0])

# print the text of document 11

docid = 16

print(sms_corpus[docid][1])

Can you get better results by improving your tokenizer?

- SMS SPAM Collection The SMS SPAM Collection is a corpus of real text messages (SMS messages) that have been classified as either SPAM or HAM (i.e. not SPAM). The corpus contains 5,574 documents, 747 of which are SPAM and 4,827 of which are HAM. You can find the readme for the corpus here. The following code downloads a copy of the SMS SPAM Corpus and saves it in a variable sms_corpus. import urllib.request, json sms_corpus = [] with urllib.request.urlopen ("https://storage.googleapis.com/wd13/SMSSpamCollection.txt") as url: for line in url.readlines(): sms_corpus.append(line.decode().split('\t')) sms_corpus is a list. Each element of the list is another list which stores a document and its label. [ ] # print the text and label of document 10 docid= 16 print(sms_corpus[docid]) ["ham", "Oh k...i'm watching here:) "] [] # print the label of document 10 docid= 16 print(sms_corpus[docid] [0]) ham [] # print the text of document 11 docid= 16 print (sms_corpus[docid][1]) Oh k...i'm watching here:)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

To complete the tasks youve outlined lets start by creating a tokenizer function Then well calculate token scores based on the method discussed in cla... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Artificial Intelligence A Modern Approach

Authors: Stuart Russell, Peter Norvig

4th Edition

0134610997, 978-0134610993

More Books

Students also viewed these Programming questions

Question

★★★★★

This assignment reviews object-oriented programming concepts such as classes, methods, constructors, accessor methods, and access modifiers. It makes use of an array of objects as a class data...

Answered: 1 week ago

Question

★★★★★

Sample Solution: https://web422-a5-fall-2022.vercel.app/ Please refer to the sample solution regularly to help you design your components and check your solution. It is there to help supplement...

Answered: 1 week ago

Question

★★★★★

Design a Java class that represents a cache with a fixed size. It should support operations like add, retrieve, and remove, and it should evict the least recently used item when it reaches capacity.

Answered: 1 week ago

Question

★★★★★

Review each of the following independent sets of conditions. Required: Use AICPA sample size tables to identify the appropriate sample size for use in a statistical sampling application (ROO 5 risk...

Answered: 1 week ago

Question

★★★★★

Hartwell Drug Company produces a supplement to improve bone density. Conversion costs are added evenly throughout the production process. The following information is available for March:...

Answered: 1 week ago

Question

★★★★★

Discuss how teaser advertising might be used to introduce a new product or brand or to reposition an existing brand. What factors should marketers take into consideration when using a teaser campaign?

Answered: 1 week ago

Question

★★★★★

The repair time, in hours, for a certain type of laptop is a continuous variable with density function (i) What is the expected time to repair a laptop of this type when it breaks down? (ii) If the...

Answered: 1 week ago

Question

★★★★★

Sales of vegetable dehydrators at Bud Baniss discount department store in St. Louis over the past year are shown below. Management prepared a forecast using a combination of exponential smoothing and...

Answered: 1 week ago

Question

★★★★★

Enhanced Lab 2.5 [2 lab points Create a class named QuadArea that will ask the user for the z and y coordinates of 4 points on a coordinate system and will then compute and report the area of the...

Answered: 1 week ago

Question

★★★★★

8. Century Plumbing Fixtures stock has a Beta of 1.15, the risk-free interest rate is 2.5%, and the equity risk premium is 5.5%. The yield to maturity on Century Plumbing Fixtures' debt is 6.15%. The...

Answered: 1 week ago

Question

★★★★★

An ABDE metal pipe is kept in balance by a DF cable as well as by two ball joints in A and E. A load of 800N (vertical) is applied in C. a) What is the moment of force associated with the load of 800...

Answered: 1 week ago

Question

★★★★★

Q1. Gourmet has PKR150 million in sales revenue with PKR90 million in cost of goods sold. It has selling and administrative expenses of PKR10, pays annual taxes in the amount of PKR10 and has...

Answered: 1 week ago

Question

★★★★★

Imagine you head a large agency such as the Department of Education (DoE). Based on your experience with the CITI certification process and your class readings, outline a protocol for approving...

Answered: 1 week ago

Question

★★★★★

Using the P-O-L-C Framework, answer the following questios: 600 wrds.. Take one item from each section that you could relate to you and discuss how you could use in your life right now.and write the...

Answered: 1 week ago

Question

★★★★★

Fig. H 4.39 Determine the forces bars 1, 2 and 3 of the plane truss loaded and supported (Ans. S S2 S3=-30 a/2h) as shown in Fig. I. = C aaaaaa B aga Fig. 1 Q

Answered: 1 week ago

Question

★★★★★

Radar Company sells bikes for $520 each. The company currently sells 4,350 bikes per year and could make as many as 4,690 bikes per year. The bikes cost $270 each to make: $160 in variable costs per...

Answered: 1 week ago

Question

★★★★★

How will relating product contribution margin s to the amount of the constrained resource they consume help a company maximize its profits?

Answered: 1 week ago

Question

★★★★★

Prove each of the following assertions: a. Every pair of propositional clauses either has no resolvents, or all their resolvents are logically equivalent. b. There is no clause that, when resolved...

Answered: 1 week ago

Question

★★★★★

Describe the differences between supervised, unsupervised, and reinforcement learning.

Answered: 1 week ago

Question

★★★★★

Value iteration: (i) Is a model-free method for finding optimal policies. (ii) Is sensitive to local optima. (iii) Is tedious to do by hand. (iv) Is guaranteed to converge when the discount factor...

Answered: 1 week ago

Question

★★★★★

18. Which neuropeptide from the arcuate nucleus to the paraventricular nucleus is most important for satiety?

Answered: 1 week ago

Question

★★★★★

Kip is worried that he is losing his mind because he finds himself angry at a friend who died in an automobile accident. Based on Kbler-Rosss research, what might you tell him? a. Anger of this type...

Answered: 1 week ago

Question

★★★★★

14. Why do people with very low insulin levels eat so much? Why do people with constantly high levels eat so much?

Answered: 1 week ago

Previous Question Next Question