Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

- Question 2 This question is about word-cooccurences, collocations and distributional similarity. Throughout this question, reference will be made to the sample of English stored

image text in transcribed

- Question 2 This question is about word-cooccurences, collocations and distributional similarity. Throughout this question, reference will be made to the sample of English stored in text1 (Lewis Carroll's Alice in Wonderland) - a sample of which is output below. ###Run this cell. Do not change the code in this cell from nltk. tokenize import sent_tokenize word_tokenize from nltk.corpus import gutenberg def get_rawtext(filename='carroll-alice.txt'): text=gutenberg.raw(filename) return text def get_text(filename='carroll-alice.txt'): text=gutenberg.raw(filename) sentences=sent_tokenize(text) tokenized= [word_tokenize (sent. lower()) for sent in sentences] normalised=[["Nth" if (token.endswith(("nd","st","th")) and token[:-2). isdigit()) else token for token in sent] for sent in tokenized] normalised=[["NUM" if token. isdigit() else token for token in sent] for sent in normalised] filtered=[ [word for word in sent if word. isalpha()] for sent in normalised] return filtered text1=get_text() text1[:10] a) Explain what each step in the get_text() function does. [10 marks)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Filing And Computer Database Projects

Authors: Jeffrey Stewart

2nd Edition

007822781X, 9780078227813

More Books

Students also viewed these Databases questions

Question

6. Identify characteristics of whiteness.

Answered: 1 week ago