Answered step by step
Verified Expert Solution
Question
1 Approved Answer
3. Get the text from the column with name text in swa.scc.tweets. Store it in a variable, tweets.text. Pre-process your text data. If you belong
3. Get the text from the column with name "text" in swa.scc.tweets. Store it in a variable, tweets.text. Pre-process your text data. If you belong to an odd numbered group construct a Document Term Matrix using square root transformation of the term frequencies, if not construct a Document Term Matrix using TFIDF weights. Ensure empty documents are not included in the final matrix (2 marks) 4. If you belong to an odd numbered group, create a cosine distance matrix of the documents using the matrix created for question 3 . If not, construct a binary distance matrix instead. (1 mark) 5. Find the optimal number of tweet document clusters using the elbow method. Use the distance matrix created for question 4 while reducing the dimensions of the data set. State your decision of the optimal number of clusters (1 mark) 6. Cluster the documents using k-means clustering. (1 mark) 7. Visualize your clustering in 2-dimensional space. Interpret your plot. (1 mark) 8. Provide a word cloud of the largest cluster. (1 mark) 9. Create a dendrogram of the frequently used words in the largest cluster. Use binary distance to find the distance between words. Use single linkage clustering. Interpret the plot (2 mark) Social Network BizAnalytics wants to create a quote network. A "quote" happens when a tweet contains new content "on top" of a shared tweet (Twitter API: Premium data dictionary | Docs | Twitter Developer Platform). Build a quote network from the tweets you downloaded and plot a graph to present to BizAnalytics. In order to solve this question, you should first find whether each tweet is a quote and get the screen names of the quoted tweets. Then get the screen names of the Twitter users of the original tweets that were quoted. Create an edge list of "who quoted whom" and plot the graph. 10. Find the row numbers of the quote tweets from the tweets data you downloaded. Store them in the variable, quote.tweets (1 mark)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started