Answered step by step
Verified Expert Solution
Question
1 Approved Answer
We will use one full day worth of tweets as our input ( there are total of 4 . 4 M tweets in this file
We will use one full day worth of tweets as our input there are total of M tweets in this file
Execute and time the following tasks with tweets and tweets:
a Use python to download tweets from the web and save to a local text file not into a database yet, just to a text file This is as simple as it sounds, all you need is a forloop that reads lines from the web and writes them into a file.
NOTE: Do not call read or readlines That command will attempt to read the entire file which is too much data. Clicking on the link in the browser would cause the same problem.
b Repeat what you did in part a but instead of saving tweets to the file, populate the table schema that you previously created in SQLite. Be sure to execute commit and verify that the data has been successfully loaded. Report loaded row counts for each of the tables by running a SELECT DISCINT of the primary key of each table. Additionally, report the runtime of finding the number of rows for each table.
NOTE: If your schema contains a foreign key in the Geo table or relies on TweetID as the primary key for the Geo table, you should change your schema. Geo entries should be identified based on the location they represent. There should not be any blank Geo entries such as ID None, None, None The easiest way to create an ID is by combining lonlat into a primary key.
c Use your locally saved tweet file to repeat the database population step from partc That is load the tweets into the table database using your saved file with tweets. This is the same code as in b but reading tweets from your file, not from the web. Time the code used to run this step and report.
d Repeat the same step with a batching size of ie by inserting rows at a time with executemany instead of doing individual inserts Since many of the tweets are missing a Geo location, its fine for the batches of Geo inserts to be smaller than
e Plot the resulting runtimes # of tweets versus runtimes using matplotlib for abc and d How does the runtime compare?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started