Answered step by step
Verified Expert Solution
Question
1 Approved Answer
- Question 2 This question is about word-cooccurences, collocations and distributional similarity. Throughout this question, reference will be made to the sample of English stored
- Question 2 This question is about word-cooccurences, collocations and distributional similarity. Throughout this question, reference will be made to the sample of English stored in text1 (Lewis Carroll's Alice in Wonderland) - a sample of which is output below. ###Run this cell. Do not change the code in this cell from nltk. tokenize import sent_tokenize word_tokenize from nltk.corpus import gutenberg def get_rawtext(filename='carroll-alice.txt'): text=gutenberg.raw(filename) return text def get_text(filename='carroll-alice.txt'): text=gutenberg.raw(filename) sentences=sent_tokenize(text) tokenized= [word_tokenize (sent. lower()) for sent in sentences] normalised=[["Nth" if (token.endswith(("nd","st","th")) and token[:-2). isdigit()) else token for token in sent] for sent in tokenized] normalised=[["NUM" if token. isdigit() else token for token in sent] for sent in normalised] filtered=[ [word for word in sent if word. isalpha()] for sent in normalised] return filtered text1=get_text() text1[:10] a) Explain what each step in the get_text() function does. [10 marks)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started