We will use one full day worth of tweets as our input ( there are total of 4 4 M tweets in this file ) Execute and time the following tasks with 1 1 0 , 0 0 0 tweets and 5 5 0 , 0 0 0 tweets a Write and execute a SQL query to find the average longitude and latitude value for each user ID This query does not need the User table because User ID is a foreign key in the Tweet table E g , something like SELECT UserID, MIN ( longitude ) , MAX ( latitude ) FROM Tweet, Geo WHERE Tweet GeoFK Geo GeoID GROUP BY UserID b Re execute the SQL query in part 2 a 5 times and 2 0 times and measure the total runtime ( just re run the same exact query multiple times using a for loop, it is as simple as it looks ) Does the runtime scale linearly ( i e , does it take 5 X and 2 0 X as much time ) What is the average runtime of each individual run c Write the equivalent of the 2 a query in python ( without using SQL ) by reading it from the file with 5 5 0 , 0 0 0 tweets d Re execute the query in part 2 c 5 times and 2 0 times and measure the total runtime Does the runtime scale linearly What is the average runtime of each individual run e Write the equivalent of the 2 a query in python by using regular expressions instead of json loads ( ) Do not use json loads ( ) here Note that you only need to find userid and geo location ( if any ) for each tweet, you don t need to parse the whole thing f Re execute the query in part 2 e 5 times and 2 0 times and measure the total runtime Does the runtime scale linearly g Create a visual using matplotlib of 2 d showing the distribution of the runtimes

Question

We will use one full day worth of tweets as our input ( there are total of 4   4 M tweets in this file )   Execute and time the following tasks with 1 1 0 , 0 0 0 tweets and 5 5 0 , 0 0 0 tweets  a   Write and execute a SQL query to find the average longitude and latitude value for each user ID   This query does not need the User table because User ID is a foreign key in the Tweet table  E   g   , something like SELECT UserID, MIN ( longitude ) , MAX ( latitude ) FROM Tweet, Geo WHERE Tweet GeoFK   Geo GeoID GROUP BY UserID  b   Re   execute the SQL query in part 2   a 5 times and 2 0 times and measure the total runtime ( just re   run the same exact query multiple times using a for   loop, it is as simple as it looks )   Does the runtime scale linearly  ( i   e   , does it take 5 X and 2 0 X as much time  ) What is the average runtime of each individual run  c   Write the equivalent of the 2   a query in python ( without using SQL ) by reading it from the file with 5 5 0 , 0 0 0 tweets  d   Re   execute the query in part 2   c 5 times and 2 0 times and measure the total runtime  Does the runtime scale linearly  What is the average runtime of each individual run  e   Write the equivalent of the 2   a query in python by using regular expressions instead of json loads ( )   Do not use json loads ( ) here  Note that you only need to find userid and geo location ( if any ) for each tweet, you don t need to parse the whole thing  f   Re   execute the query in part 2   e 5 times and 2 0 times and measure the total runtime  Does the runtime scale linearly  g   Create a visual using matplotlib of 2 d showing the distribution of the runtimes

Accepted Answer

The Answer is in the image, click to view ...

Question

We will use one full day worth of tweets as our input ( there are total of 4 . 4 M tweets in this file

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Focus On Geodatabases In ArcGIS Pro

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question