Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 30, 2024

We will use one full day worth of tweets as our input ( there are total of 4 . 4 M tweets in this file

We will use one full day worth of tweets as our input

(

there are total of

4.4

M tweets in this file

)

Execute and time the following tasks with

110, 000

tweets and

550, 000

tweets:

.

Use python to download tweets from the web and save to a local text file

(

not into a database yet, just to a text file

) .

This is as simple as it sounds, all you need is a for

-

loop that reads lines from the web and writes them into a file.

NOTE: Do not call read

()

or readlines

() .

That command will attempt to read the entire file which is too much data. Clicking on the link in the browser would cause the same problem.

.

Repeat what you did in part

1 -

,

but instead of saving tweets to the file, populate the

3 -

table schema that you previously created in SQLite. Be sure to execute commit and verify that the data has been successfully loaded. Report loaded row counts for each of the

3

tables by running a SELECT DISCINT of the primary key of each table. Additionally, report the runtime of finding the number of rows for each table.

NOTE: If your schema contains a foreign key in the Geo table or relies on TweetID as the primary key for the Geo table, you should change your schema. Geo entries should be identified based on the location they represent. There should not be any

blank

Geo entries such as

(

,

None, None, None

) .

The easiest way to create an ID is by combining lon

_

lat into a primary key.

.

Use your locally saved tweet file to repeat the database population step from part

-

.

That is

,

load the tweets into the

3 -

table database using your saved file with tweets. This is the same code as in

1 -

,

but reading tweets from your file, not from the web. Time the code used to run this step and report.

.

Repeat the same step with a batching size of

2, 000 (

.

.

by inserting

2, 000

rows at a time with executemany instead of doing individual inserts

) .

Since many of the tweets are missing a Geo location, its fine for the batches of Geo inserts to be smaller than

2, 000 .

.

Plot the resulting runtimes

(

# of tweets versus runtimes

)

using matplotlib for

1 -

, 1 -

, 1 -

,

and

1 -

.

How does the runtime compare?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Concepts of Database Management

Authors: Philip J. Pratt, Joseph J. Adamski

7th edition

978-1111825911, 1111825912, 978-1133684374, 1133684378, 978-111182591

More Books

Students also viewed these Databases questions

Question

★★★★★

The City of Oxbow General Fund has the following net resources at yearend: $100,000 unexpended proceeds of a state grant required by law to be used for health education. $15,000 of prepaid...

Answered: 1 week ago

Question

★★★★★

PART C: On January 1, 2018, Pal Corporation acquired 80% of the voting stock of Secam Corporation for $24,000 when Secam had Capital Stock of $10,000 and Retained Earnings of $8,000. On this date,...

Answered: 1 week ago

Question

★★★★★

A directive requesting sales executives to submit their monthly sales reports to their respective managers by the 25th of each month. The directive also requests these sales managers to then compile...

Answered: 1 week ago

Question

★★★★★

Heather Reasonover opted to try Internet service from Clearwire Corp. Clearwire sent her a confirmation e- mail that included a link to its Web site. Clearwire also sent her a modem. In the enclosed...

Answered: 1 week ago

Question

★★★★★

We will use one full day worth of tweets as our input ( there are total of 4 . 4 M tweets in this file ) Execute and time the following tasks with 1 1 0 , 0 0 0 tweets and 5 5 0 , 0 0 0 tweets: a ....

Answered: 1 week ago

Question

★★★★★

Board Games, Inc. makes board games. The following data pertains to the last six months: Direct Labor Hours Manufacturing Overhead Month 1 45,000 $295,000 Month 2 68,000 $315,000 Month 3 57,000...

Answered: 1 week ago

Question

★★★★★

1. How should Bill Ahern resolve each of the accounting conflict between the owners and the players? 2. How much did the Kansas City Zephyrs Baseball Club earn in 1983 and 1984

Answered: 1 week ago

Question

★★★★★

To successfully complete this assignment, please submit a word document answering each of the questions below by using both the AICPA Code and your state code. Will your independence be threatened if...

Answered: 1 week ago

Question

★★★★★

"Doing business globally is increasingly a pre-requisite for success even for the smallest of firms." Where does this put developing countries like the Bahamas?

Answered: 1 week ago

Question

★★★★★

1 identify the pest factor for the country and develop a brief for forecast for each factor? 2 identify the SWOT for the organization Pepsi is a global consumer primarily engaged in product marketing...

Answered: 1 week ago

Question

★★★★★

When the manager of a Domino's restaurant assigns to some employees responsibilities for cooking and to others for taking orders from customers this is called? amatrix structure be sharing...

Answered: 1 week ago

Previous Question Next Question