Answered step by step
Verified Expert Solution
Question
1 Approved Answer
We will use one full day worth of tweets as our input ( there are total of 4 . 4 M tweets in this file
We will use one full day worth of tweets as our input there are total of M tweets in this file:
Execute and time the following tasks with tweets and tweets:
a Write and execute a SQL query to find the average longitude and latitude value for each user ID This query does not need the User table because User ID is a foreign key in the Tweet table. Eg something like SELECT UserID, MINlongitude MAXlatitude FROM Tweet, Geo WHERE Tweet.GeoFK Geo.GeoID GROUP BY UserID;
b Reexecute the SQL query in part a times and times and measure the total runtime just rerun the same exact query multiple times using a forloop, it is as simple as it looks Does the runtime scale linearly? ie does it take X and X as much time? What is the average runtime of each individual run?
c Write the equivalent of the a query in python without using SQL by reading it from the file with tweets.
d Reexecute the query in part c times and times and measure the total runtime. Does the runtime scale linearly? What is the average runtime of each individual run?
e Write the equivalent of the a query in python by using regular expressions instead of json.loads Do not use json.loads here. Note that you only need to find userid and geo location if any for each tweet, you dont need to parse the whole thing.
f Reexecute the query in part e times and times and measure the total runtime. Does the runtime scale linearly?
g Create a visual using matplotlib of d showing the distribution of the runtimes.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started