Question
#do not change the code in this cell #make sure you run this cell once if working on colab or a fresh installation of anaconda
#do not change the code in this cell
#make sure you run this cell once if working on colab or a fresh installation of anaconda
import nltk
nltk.download('punkt')
# do not change the code in this cell
# make sure you run this cell
from nltk.tokenize import word_tokenize
q1text = "Facebook is planning to hire 10,000 people in the European Union to develop a so-called metaverse. A metaverse is an online world where people can game, work and communicate in a virtual environment, often using VR headsets. Facebook CEO Mark Zuckerberg has been a leading voice on the concept. The announcement comes as Facebook deals with the fallout of a damaging scandal and faces increased calls for regulation to curb its influence. What is the metaverse? `The metaverse has the potential to help unlock access to new creative, social, and economic opportunities. And Europeans will be shaping it right from the start,' Facebook said in a blog post. The new jobs being created over the next five years will include highly specialised engineers. Investing in the EU offered many advantages, including access to a large consumer market, first-class universities and high-quality talent, Facebook said. Facebook has made building the metaverse one of its big priorities. Despite its history of buying up rivals, Facebook claims the metaverse `won't be built overnight by a single company' and has promised to collaborate. It recently invested $50m (36.3m) in funding non-profit groups to help build the metaverse responsibly. But it thinks the true metaverse idea will take another 10 to 15 years. Some critics say this latest announcement is designed to re-establish the company's reputation and divert attention, after a series of damaging scandals in recent months. This included revelations from whistleblower Frances Haugen, who worked as a product manager on the civic integrity team at Facebook. Internal research by Facebook found that Instagram, which it owns, was affecting the mental health of teenagers. But Facebook did not share its findings when they suggested that the platform was a `toxic' place for many youngsters."
q1texttokens = word_tokenize(q1text)
q1texttokens
a) By writing python code and assuming no further pre-processing needs to be applied to the text, find each of the following statistics for `q1texttokens`:
i) the number of tokens in `q1texttokens` [2 marks]
ii) the number of distinct token types in `q1texttokens` [3 marks]
iii) the number of hapax legomena in `q1texttokens` [3 marks]
iv) the number of tokens which are punctuation [3 marks]
v) the number of types which are punctuation [2 marks]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started