Question
#do not change the code in this cell #make sure you run this cell once if you are working on colab or on a fresh
#do not change the code in this cell
#make sure you run this cell once if you are working on colab or on a fresh installation of anaconda
import nltk
nltk.download('twitter_samples')
nltk.download('punkt')
# do not change the code in this cell
# make sure you run this cell
from nltk.corpus import twitter_samples
from nltk.tokenize import word_tokenize
import random
import math
def sample_sentences(corpus, sample_size):
size = len(corpus)
ids = random.sample(range(size), sample_size)
sample = [corpus[i] for i in ids]
return sample
random.seed(37)
tsample = sample_sentences(twitter_samples.strings(), 1000)
twittertokens = [word_tokenize(tweet.lower()) for tweet in tsample]
twittertokens[:5]
iii) Find the token with the highest part-of-speech tag ambiguity in the sample. Explain how you arrived at your answer.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started