Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

#do not change the code in this cell #make sure you run this cell once if you are working on colab or on a fresh

#do not change the code in this cell

#make sure you run this cell once if you are working on colab or on a fresh installation of anaconda

import nltk

nltk.download('twitter_samples')

nltk.download('punkt')

# do not change the code in this cell

# make sure you run this cell

from nltk.corpus import twitter_samples

from nltk.tokenize import word_tokenize

import random

import math

def sample_sentences(corpus, sample_size):

size = len(corpus)

ids = random.sample(range(size), sample_size)

sample = [corpus[i] for i in ids]

return sample

random.seed(37)

tsample = sample_sentences(twitter_samples.strings(), 1000)

twittertokens = [word_tokenize(tweet.lower()) for tweet in tsample]

twittertokens[:5]

iii) Find the token with the highest part-of-speech tag ambiguity in the sample. Explain how you arrived at your answer.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2014 Nancy France September 15 19 2014 Proceedings Part I Lnai 8724

Authors: Toon Calders ,Floriana Esposito ,Eyke Hullermeier ,Rosa Meo

2014th Edition

3662448475, 978-3662448472

More Books

Students also viewed these Databases questions

Question

Explain the service recovery paradox.

Answered: 1 week ago