Help me figure out whats wrong with my python code. Thats the code: import nltk import re import pickle raw = open('tom_sawyer_shrt.txt').read() ### this is
Help me figure out whats wrong with my python code.
Thats the code:
import nltk import re import pickle
raw = open('tom_sawyer_shrt.txt').read()
### this is how the basic Punkt sentence tokenizer works #sent_tokenizer=nltk.data.load('tokenizers/punkt/english.pickle') #sents = sent_tokenizer.tokenize(raw)
### train & tokenize text using text sent_trainer = nltk.tokenize.punkt.PunktSentenceTokenizer().train(raw) sent_tokenizer = nltk.tokenize.punkt.PunktSentenceTokenizer(sent_trainer) # break in to sentences sents = sent_tokenizer.tokenize(raw) # get sentence start/stop indexes sentspan = sent_tokenizer.span_tokenize(raw)
### Remove in the middle of setences, due to fixed-width formatting for i in range(0,len(sents)-1): sents[i] = re.sub('(?
for i in range(1,len(sents)): if (sents[i][0:3] == '" '): sents[i-1] = sents[i-1]+'" ' sents[i] = sents[i][3:]
### Loop thru each sentence, fix to 140char i=0 tweet=[] while (i ### A last pass to clean up leading/trailing newlines/spaces. for i in range(0,len(tweet)): tweet[i] = re.sub('\A\s|\s\Z','',tweet[i]) for i in range(0,len(tweet)): tweet[i] = re.sub('\A" ','',tweet[i]) ### Save tweets to pickle file for easy reading later output = open('tweet_list.pkl','wb') pickle.dump(tweet,output,-1) output.close() listout = open('tweet_lis.txt','w') for i in range(0,len(tweet)): listout.write(tweet[i]) listout.write(' ----------------- ') listout.close() And thats the eroor message: Traceback (most recent call last): File "twain_prep.py", line 13, in
Step by Step Solution
There are 3 Steps involved in it
Step: 1
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started