Answered step by step
Verified Expert Solution
Question
1 Approved Answer
First, research / gather the data: 1 . ChooseoneStackExchangesitedealingwithtopicsthatyoufindinteresting;seehttps: / / stackexc hange.com / sites ? view = list#traffic for a list. The site cannot
First, researchgather the data:
ChooseoneStackExchangesitedealingwithtopicsthatyoufindinteresting;seehttps:stackexc hange.comsitesviewlist#traffic for a list. The site cannot be too small, but also avoid selecting any of the largest ones especially StackOverflow, Mathematics unless you really want to challenge yourself. As a rule of thumb, lets say that the site must have at least questions and answers.
This document was originally developed by Dr Marek Gagolewski. It was subsequently revised by Dr Yang Li Kelvin during the work at the School of Information Technology, Deakin University, for the unit SIT Data Wrangling, Trimester
Downloadthesitesmostrecentdatadumpfromhttps:archiveorgdetailsstackexchange
Readthedescriptionofallthedatatablespublishedathttps:metastackexchange.comquestio
ns
Then, create a single Quarto qmd file that you will be rendering to a PDF report how to do that you will
have to learn yourself this is part of this HDlevel task where you perform what follows.
ConvertallthedatatablesBadgesComments,PostHistory,PostLinks,Posts,Tags,Users,Votes from XML to CSV using custom code that you write yourself. Ideally, you should write a Python function that takes a single input file name xml and output file name csv and performs the conversion of a single dataset.
LoadtheCSVfilesaspandasdataframes.
Createatleastfivenontrivialdatavisualisationsandortablesatleastthreeofwhicharebasedon the extraction of information from text eg tags, keywords, locations, etc. You must demon strate that you have learned how to write your own regular expressions regexes
Drawinsightfulandinterestingconclusions.Donotforgettoreflectonthepotentialdataprivacy and ethics issues that arise during the data analysis process.
This HDlevel task is purposely underdefined you will not be told precisely what to do Your aim is to generate some interesting insights into data featuring lots of textual information.
In the course of the report preparation, you should apply a wide range of data frame wrangling and text processing techniques. In particular, you must demonstrate that you mastered regular expressions.
Do not use pie charts as we discussed during the lecture Go beyond the basic plots that we have covered in this course. Draw at least one map eg of the world and a word cloud.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started