Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

A spoiler for 2(a): there is a deficiency in how we implemented the get_words function. When we are counting words, we probably dont care whether

A spoiler for 2(a): there is a deficiency in how we implemented the get_words function. When we are counting words, we probably dont care whether the word was adjacent to a punctuation mark. For example, the word hatter appears in the text 57 times, but if we queried the count_words dictionary, we would see it only appeared 24 times. However, it also appeared numerous times adjacent to a punctuation mark, so those instances got counted separately:

word_freq = words_by_frequency(words) for (word, freq) in word_freq: if 'hatter' in word: print('{:10} {:3d}'.format(word, freq)) hatter 24 hatter. 13 hatter, 10 hatter: 6 hatters 1 hatter's 1 hatter; 1 hatter.' 1

Our get_words function would be better if it separated punctuation from words. We can accomplish this by using the re.split function. Be sure to import re to make re.split() work. Below is a small example that demonstrates how str.split works on a small text and compares it to using re.split:

text = '"Oh no, no," said the little Fly, "to ask me is in vain."' text.split() ['"Oh', 'no,', 'no,"', 'said', 'the', 'little', 'Fly,', '"to', 'ask', 'me', 'is', 'in', 'vain."'] re.split(r'(\W)', text) ['', '"', 'Oh', ' ', 'no', ',', '', ' ', 'no', ',', '', '"', '', ' ', 'said', ' ', 'the', ' ', 'little', ' ', 'Fly', ',', '', ' ', '', '"', 'to', ' ', 'ask', ' ', 'me', ' ', 'is', ' ', 'in', ' ', 'vain', '.', '', '"', '']

Note that this is not exactly what we want, but it is a lot closer. In the resulting list, we find empty strings and spaces, but we have also successfully separated the punctuation from the words.

Using the above example as a guide, write and test a function called tokenize that takes a string as an input and returns a list of words and punctuation, but not extraneous spaces and empty strings. Like get_words, tokenize should take an optional argument do_lower that determines whether the string should be case normalized before separating the words. You dont need to modify the re.split() line: just remove the empty strings and spaces.

tokenize(text, do_lower=True) ['"', 'oh', 'no', ',', 'no', ',', '"', 'said', 'the', 'little', 'fly', ',', '"', 'to', 'ask', 'me', 'is', 'in', 'vain', '.', '"'] print(' '.join(tokenize(text, do_lower=True))) " oh no , no , " said the little fly , " to ask me is in vain . "

Checking In

Use your tokenize function in conjunction with your count_words function to list the top 5 most frequent words in carroll-alice.txt. You should get this:

' 2871 <-- single quote , 2418 <-- comma the 1642 . 988 <-- period and 872

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Systems For Advanced Applications 17th International Conference Dasfaa 2012 International Workshops Flashdb Items Snsm Sim Dqdi Busan South Korea April 2012 Proceedings Lncs 7240

Authors: Hwanjo Yu ,Ge Yu ,Wynne Hsu ,Yang-Sae Moon ,Rainer Unland ,Jaesoo Yoo

2012th Edition

3642290221, 978-3642290220

More Books

Students also viewed these Databases questions

Question

★★★★★

7.10 (0) In 1933, the Swedish economist Gunnar Myrdal (who later won a Nobel prize in economics) and a group of his associates at Stockholm University collected a fantastically detailed historical...

Answered: 1 week ago

Question

★★★★★

Hexadecimal Equivalent Signed Integer Decimal Equivalent 10010010 10011110 01101101 01010011 10110100 11010101 01010110 10101111 01010111 11101100 01010010 10011101

Answered: 1 week ago

Question

★★★★★

Radial keratotomy is a type of refractive surgery in which radial incisions are made in a myopic (nearsighted) patient's cornea to reduce the person's myopia. Theoretically, the incisions allow the...

Answered: 1 week ago

Question

★★★★★

Prepare common-sized balance sheets and income statements for Just for Feet for the period 19961998. Also compute key liquidity, solvency, activity, and profitability ratios for 1997 and 1998. Given...

Answered: 1 week ago

Question

★★★★★

A spoiler for 2(a): there is a deficiency in how we implemented the get_words function. When we are counting words, we probably dont care whether the word was adjacent to a punctuation mark. For...

Answered: 1 week ago

Question

★★★★★

The Greensboro Performing Arts Center (GPAC) has a total capacity of 9,400 seats: 2,600 center seats, 3,100 side seats, and 3,700 balcony seats. The budgeted and actual tickets sold for a Broadway...

Answered: 1 week ago

Question

★★★★★

Ayayai Company prepares its statement of cash flows using the direct method for operating activities. For the year ended December 31, 2024, Ayayai Company reports the following: Sales on account...

Answered: 1 week ago

Question

★★★★★

Navajo Company's year-end financial statements show the following. The company recently discovered that in making physical counts of inventory, it had made the following errors: Year 1 ending...

Answered: 1 week ago

Question

★★★★★

Lucido Products markets two computer games: Claimjumper and Makeover. A contribution format income statement for a recent month for the two games appears below: Claimjumper Makeover $ 96,000 $ 48,000...

Answered: 1 week ago

Question

★★★★★

Franks Hot Dogs is a customer of your company and purchases hot dog buns from you. They have multiple restaurant locations that they want you to deliver the hot dog buns to, but they want all of the...

Answered: 1 week ago

Question

★★★★★

Outback Outfitters sells recreational equipment. One of the company's products, a small camp stove, sells for $130 per unit. Variable expenses are $91 per stove, and fixed expenses associated with...

Answered: 1 week ago

Question

★★★★★

(Appendices) The Bureau of Labor Statistics provides detailed information on unemployment at the national, state, and local level. Go to www.bls.gov/lau/home.htm. See Latest Numbers and answer the...

Answered: 1 week ago

Question

★★★★★

(Appendices) Medicaid is a major public assistance program that provides health care to certain low-income individuals and families. Answer the following questions with respect to current Medicaid...

Answered: 1 week ago

Question

★★★★★

(Appendices) Jeremy, age 45, is the general manager of five fast-food restaurants. Each restaurant has a delivery service for customers who call in orders. Employees drive company cars to deliver the...

Answered: 1 week ago

Previous Question Next Question