Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

3. Get the text from the column with name text in swa.scc.tweets. Store it in a variable, tweets.text. Pre-process your text data. If you belong

image text in transcribed

3. Get the text from the column with name "text" in swa.scc.tweets. Store it in a variable, tweets.text. Pre-process your text data. If you belong to an odd numbered group construct a Document Term Matrix using square root transformation of the term frequencies, if not construct a Document Term Matrix using TFIDF weights. Ensure empty documents are not included in the final matrix (2 marks) 4. If you belong to an odd numbered group, create a cosine distance matrix of the documents using the matrix created for question 3 . If not, construct a binary distance matrix instead. (1 mark) 5. Find the optimal number of tweet document clusters using the elbow method. Use the distance matrix created for question 4 while reducing the dimensions of the data set. State your decision of the optimal number of clusters (1 mark) 6. Cluster the documents using k-means clustering. (1 mark) 7. Visualize your clustering in 2-dimensional space. Interpret your plot. (1 mark) 8. Provide a word cloud of the largest cluster. (1 mark) 9. Create a dendrogram of the frequently used words in the largest cluster. Use binary distance to find the distance between words. Use single linkage clustering. Interpret the plot (2 mark) Social Network BizAnalytics wants to create a quote network. A "quote" happens when a tweet contains new content "on top" of a shared tweet (Twitter API: Premium data dictionary | Docs | Twitter Developer Platform). Build a quote network from the tweets you downloaded and plot a graph to present to BizAnalytics. In order to solve this question, you should first find whether each tweet is a quote and get the screen names of the quoted tweets. Then get the screen names of the Twitter users of the original tweets that were quoted. Create an edge list of "who quoted whom" and plot the graph. 10. Find the row numbers of the quote tweets from the tweets data you downloaded. Store them in the variable, quote.tweets (1 mark)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Inference Control In Statistical Databases From Theory To Practice Lncs 2316

Authors: Josep Domingo-Ferrer

2002nd Edition

3540436146, 978-3540436140

More Books

Students also viewed these Databases questions