Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Question: Tokenizing Text You will process text saved to the text_data variable. The variable is of type String . This data is know as a
Question: Tokenizing Text You will process text saved to the text_data variable. The variable is of type String . This data is know as a (very small) dataset, or sometimes referred to as corpora. The etymology of corpora is it comes from corpus or corpse, meaning the body of something. So corpora refers to a body of texts (or collection of texts). text_data ="" Here's to the crazy ones, the misfits, the rebels, the troublemakers, the round pegs in the square holes. The ones who see things that they can change the world, are the ones who do. The quote baove is by Steve Jobs. Mr. Jobs also said: I choose a lazy person to do a hard job. Because a lazy person will find ar """ Break the above text into **paragraph tokens** (a list of paragraphs). 1 How many paragraphs do you have? [ ] : \# YOUR CODE IN THIS CELL \#raise NotImplementedError() \# Remove this after you have started implementing your code below number_of_paragraphs =0 print(number_of_paragraphs) Break the above text into sentence tokens. How many sentences do you have? [ ]: \# YOUR CODE IN THIS CELL raise NotImplementedError() \# Remove this after you have started implementing your code below number_of_sentences =0 Break the above text into word tokens. How many words do you have? [ ]: \# YOUR CODE IN THIS CELL raise NotImplementedError() \# Remove this after you have started implementing your code below number_of_words =0
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started