Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Python SPYDER 1) Now add code to convert all your tokens to lower case. 2) Now add code to stem your tokens. Describe the parsed

Python SPYDER

1) Now add code to convert all your tokens to lower case.

2) Now add code to stem your tokens. Describe the parsed output has changed from the previous step.

image text in transcribed

3 Created on Mon Feb 15 10:11:39 2021 1 @author: alana #REPLACE THE LINE BELOW WITH YOUR FILE. f = open("sample.txt", "r") 2 1 2 3 1 import nltk from pprint import pprint #THIS WILL JUST READ IN YOUR FILE AND PRINT IT OUT. #IF THIS DOESN'T PRINT ANYTHING, CHECK YOUR FILE & FILEPATH/FILE LOCATION sample_text = f.read(); print(sample_text); O Console #THIS CODE TOKENIZES THE SENTANCES IN YOUR FILE default_st=nltk.sent_tokenize sample_sent=default_st(text=sample_text) print('total sentences in sample"'+str(len( sample_sent))) print("the first 2 sentences in sample are:") pprint(sample_sent[0:2]) Python 3 Type "co IPython 3 #THIS LINE CREATES A NEW FILE TO HOLD YOUR PROCESSED TEXT. #RENAME IT FOR EACH OF THE FILES YOU'RE GOING TO TOKENIZE f2 = open("parsedSample.txt", "w+") In [1]: 2 3 1 #THIS SECTION WILL LOOP THROUGH YOUR SENTANCES AND WORD TOKENIZE THEM wordpunkt_wt=nltk.WordPunctTokenizer() for line in sample_sent: words=wordpunkt_wt.tokenize(line) print(words) #THIS LOOP WILL PRINT YOUR TOKENIZED WORDS TO THE FILE for word in words: f2.write(word+" ") 5 3 #THINGS YOU SHOULDN'T LEAVE OPEN WHEN DONE WITH THEM: #CONVERTIBLES, REFRIGERATORS, AND FILESTREAMS f.close() f2.close() 3 3 Created on Mon Feb 15 10:11:39 2021 1 @author: alana #REPLACE THE LINE BELOW WITH YOUR FILE. f = open("sample.txt", "r") 2 1 2 3 1 import nltk from pprint import pprint #THIS WILL JUST READ IN YOUR FILE AND PRINT IT OUT. #IF THIS DOESN'T PRINT ANYTHING, CHECK YOUR FILE & FILEPATH/FILE LOCATION sample_text = f.read(); print(sample_text); O Console #THIS CODE TOKENIZES THE SENTANCES IN YOUR FILE default_st=nltk.sent_tokenize sample_sent=default_st(text=sample_text) print('total sentences in sample"'+str(len( sample_sent))) print("the first 2 sentences in sample are:") pprint(sample_sent[0:2]) Python 3 Type "co IPython 3 #THIS LINE CREATES A NEW FILE TO HOLD YOUR PROCESSED TEXT. #RENAME IT FOR EACH OF THE FILES YOU'RE GOING TO TOKENIZE f2 = open("parsedSample.txt", "w+") In [1]: 2 3 1 #THIS SECTION WILL LOOP THROUGH YOUR SENTANCES AND WORD TOKENIZE THEM wordpunkt_wt=nltk.WordPunctTokenizer() for line in sample_sent: words=wordpunkt_wt.tokenize(line) print(words) #THIS LOOP WILL PRINT YOUR TOKENIZED WORDS TO THE FILE for word in words: f2.write(word+" ") 5 3 #THINGS YOU SHOULDN'T LEAVE OPEN WHEN DONE WITH THEM: #CONVERTIBLES, REFRIGERATORS, AND FILESTREAMS f.close() f2.close() 3

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Probabilistic Databases

Authors: Dan Suciu, Dan Olteanu, Christopher Re, Christoph Koch

1st Edition

3031007514, 978-3031007514

More Books

Students also viewed these Databases questions