Question
I NEED HELP WITH THE FOLLOWING CODED, below is the problem and the coded We will use a full day worth of tweets as an
I NEED HELP WITH THE FOLLOWING CODED, below is the problem and the coded
We will use a full day worth of tweets as an input (there are total of 4.4M tweets in this file, but you only need to read 1M) http://rasinsrv07.cstcis.cti.depaul.edu/CSC455/OneDayOfTweets.txt
Repeat what you did in part-b, but instead of saving tweets to the file, populate the 3-table schema that you created in SQLite. Be sure to execute commit and verify that the data has been successfully loaded (report loaded row counts for each of the 3 tables) How long did this step take?
import urllib.request, time, json, sqlite3
conn = sqlite3.connect('Tweets_Database_final.db')
c = conn.cursor()
c.execute('DROP TABLE IF EXISTS Tweets');
c.execute('DROP TABLE IF EXISTS USER');
# Create Table Tweets
c.execute('''CREATE TABLE tweet
created_at DATETIME,
user_id INT,
id_str TEXT,
text TEXT,
source TEXT,
in_reply_to_user_id INT,
in_reply_to_screen_name TEXT,
in_reply_to_status_id INT,
retweet_count INT,
contributors TEXT
CONSTRAINT tweet_FK
FOREIGN KEY (user_id) REFERENCES user(id),
# Create Table User
c.execute('''CREATE TABLE user
id INT,
screen_name TEXT,
description TEXT,
friends_count INT,
contributors TEXT
CONSTRAINT USER_PK
PRIMARY KEY(ID)
# Create Table User
c.execute('''CREATE TABLE Geo
user_id INT,
type TEXT,
longitude INT,
latitude INT,
CONSTRAINT tweet_FK_2
FOREIGN KEY (user_id) REFERENCES tweet(user_id)
Line = webFD.readline()
tweetsdata= (Line.decode('utf32')).split('EndOfTweet')
errors = open("final_errors.txt", "w")
Datatweet = []
Datauser = []
DataGeo = []
for Line in range(500000):
try:
tweetrecord= json.loads(Line)
datatweet.append((tweetrecord["created_at"], tweetrecord["user_id "], tweetrecord["id_str"], tweetrecord["text"],tweetrecord["source"], tweetrecord["in_reply_to_user_id"],tweetrecord["in_reply_to_screen_name"], tweetrecord["in_reply_to_status_id"],tweetrecord['retweet_count'],tweetrecord['contributors']))
if tweetrecord[key] in ['',[],'null']:
datatweet.append(None)
else:
datatweet.append(tweetrecord[key])
c.execute('INSERT INTO tweet (created_at, user_id, id_str, text, source, in_reply_to_user_id, in_reply_to_screen_name, in_reply_to_status_id,retweet_count, contributors) VALUES (?,?,?,?,?,?,?,?,?,?)',datatweet)
datauser.append((tweetrecord["id"], tweetrecord["screen_name "], tweetrecord["desription"], tweetrecord["source"], tweetrecord["friends_count"]))
if tweetrecord[key] in ['',[],'null']:
datauser.append(None)
else:
datauser.append(tweetrecord[key])
c.executemany('INSERT INTO User (id, screen_name, description, source, friends_count) VALUES (?,?,?,?,?)', datauser)
dataGeo.append((tweetrecord["id_str"], tweetrecord["type "], tweetrecord["longitude"], tweetrecord["source"], tweetrecord["longitude"]))
if tweetrecord[key] in ['',[],'null']:
dataGeo.append(None)
else:
dataGeo.append(tweetrecord[key])
count += 1
c.executemany('INSERT INTO Geo (id_str, type, longitude, source, longitude ) VALUES (?,?,?,?,?)', dataGeo)
except ValueError:
print(tweet)
errors.write(tweet)
print("tweets loaded to file: ", count)
end = time.time()
print ("Difference is ", round((end-start),3), 'seconds')
print ("Performance : ", round(100000/(end-start), 3), ' operations per second ')
print("tweetfinal.txt on desktop")
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started