Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Help me figure out whats wrong with my python code. Thats the code: import nltk import re import pickle raw = open('tom_sawyer_shrt.txt').read() ### this is

Help me figure out whats wrong with my python code.

Thats the code:

import nltk import re import pickle

raw = open('tom_sawyer_shrt.txt').read()

### this is how the basic Punkt sentence tokenizer works #sent_tokenizer=nltk.data.load('tokenizers/punkt/english.pickle') #sents = sent_tokenizer.tokenize(raw)

### train & tokenize text using text sent_trainer = nltk.tokenize.punkt.PunktSentenceTokenizer().train(raw) sent_tokenizer = nltk.tokenize.punkt.PunktSentenceTokenizer(sent_trainer) # break in to sentences sents = sent_tokenizer.tokenize(raw) # get sentence start/stop indexes sentspan = sent_tokenizer.span_tokenize(raw)

### Remove in the middle of setences, due to fixed-width formatting for i in range(0,len(sents)-1): sents[i] = re.sub('(?

for i in range(1,len(sents)): if (sents[i][0:3] == '" '): sents[i-1] = sents[i-1]+'" ' sents[i] = sents[i][3:]

### Loop thru each sentence, fix to 140char i=0 tweet=[] while (i 140): ntwt = int(len(sents[i])/140) + 1 words = sents[i].split(' ') nwords = len(words) for k in range(0,ntwt): tweet = tweet + [ re.sub('\A\s|\s\Z', '', ' '.join( words[int(k*nwords/float(ntwt)): int((k+1)*nwords/float(ntwt))] ))] i=i+1 else: if (i

### A last pass to clean up leading/trailing newlines/spaces. for i in range(0,len(tweet)): tweet[i] = re.sub('\A\s|\s\Z','',tweet[i])

for i in range(0,len(tweet)): tweet[i] = re.sub('\A" ','',tweet[i])

### Save tweets to pickle file for easy reading later output = open('tweet_list.pkl','wb') pickle.dump(tweet,output,-1) output.close()

listout = open('tweet_lis.txt','w') for i in range(0,len(tweet)): listout.write(tweet[i]) listout.write(' ----------------- ')

listout.close()

And thats the eroor message:

Traceback (most recent call last): File "twain_prep.py", line 13, in sent_trainer = nltk.tokenize.punkt.PunktSentenceTokenizer().train(raw) File "/home/user/.local/lib/python2.7/site-packages/nltk/tokenize/punkt.py", line 1227, in train token_cls=self._Token).get_params() File "/home/user/.local/lib/python2.7/site-packages/nltk/tokenize/punkt.py", line 649, in __init__ self.train(train_text, verbose, finalize=True) File "/home/user/.local/lib/python2.7/site-packages/nltk/tokenize/punkt.py", line 713, in train self._train_tokens(self._tokenize_words(text), verbose) File "/home/user/.local/lib/python2.7/site-packages/nltk/tokenize/punkt.py", line 729, in _train_tokens tokens = list(tokens) File "/home/user/.local/lib/python2.7/site-packages/nltk/tokenize/punkt.py", line 542, in _tokenize_words for line in plaintext.split(' '): UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 6: ordinal not in range(128)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Readings In Database Systems

Readings In Database Systems

Authors: Michael Stonebraker

2nd Edition

0934613656, 9780934613651

More Books

Students explore these related Databases questions

Question

What are the major differences between the economic model of social responsibility and the socioeconomic model?

Answered: 3 weeks ago

Question

=+24. Software company. A small software company will bid on a major contract. It anticipates a profit of $50,000 if it gets it, but thinks there is only a 30% chance of that happening.

Answered: 3 weeks ago

Question

=+j Describe the problems associated with implementing an effective global HR information system.

Answered: 3 weeks ago

Question

Danville Company specializes in the repair of music equipment and is owned and operated by Harry Nagel. On April 30, 2008, the end of the current year, the accountant for Danville Company prepared...

Answered: 3 weeks ago

Question

Help me figure out whats wrong with my python code. Thats the code: import nltk import re import pickle raw = open('tom_sawyer_shrt.txt').read() ### this is how the basic Punkt sentence tokenizer...

Answered: 3 weeks ago

Question

Should diversity training be mandatory compliance for all companies? Would there be anyway to enforce that and who would do so?

Answered: 3 weeks ago

Question

you will be introduced with the Dynamic Host Control Protocol (DHCP) service available on Cisco routers. You are then required to design and setup a small enterprise network by putting together the...

Answered: 3 weeks ago

Question

1- Please look over the last problem for the week. Duffy Corporation has prepared the following sales budget: Month Cash Sales Credit Sales May $16,000 $68,000 June 20,000 80,000 July 18,000 74,000...

Answered: 3 weeks ago

Question

You are assigned the role of security manager. The Director of Operations, thinks this is a good opportunity to train all the facilities management staff in key elements of security. Recommending a...

Answered: 3 weeks ago

Question

Provide an outline of the steps for the change management and configuration management processes of the new server as well as the website that needs to be updated and hosted. This information should...

Answered: 3 weeks ago

Question

At the end of Year o Jarrett Corp. developed the following forecasts of net income: Forecasted Year Net Income Year 1 $20,856 Year 2 $22,733 Year 3 $24,552 Year 4 $27,252 Year 5 $29,978 Management...

Answered: 3 weeks ago

Question

Economist John Taylor has suggested that the Fed use the following rule for choosing its target for the federal funds interest rate (r): r = 2% + + 12 (y y*) / y* + 12 ( *), where is the average...

Answered: 3 weeks ago

Question

Suppose banks install automatic teller machines on every block and, by making cash readily available, reduce the amount of money people want to hold. a. Assume the Fed does not change the money...

Answered: 3 weeks ago

Question

Consider two policiesa tax cut that will last for only 1 year and a tax cut that is expected to be permanent. Which policy will stimulate greater spending by consumers? Which policy will have the...

Answered: 3 weeks ago

Previous Question Next Question