Question
Write a program(IN PYTHON) that merges and sorts two twitter feeds. At a high level, your program is going to perform the following: Read in
Write a program(IN PYTHON) that merges and sorts two twitter feeds. At a high level, your program is going to perform the following:
Read in two files containing twitter feeds.
Merge the twitter feeds in reverse chronological order (most recent first).
Write the merged feeds to an output file.
Provide some basic summary information about the files.
The names of the files will be passed in to your program via command line arguments. Use the following input files to test your program: tweet1.txt and tweet2.txt
The output of your program includes the following:
Console
The name of the file that contained the most tweets followed by the number of tweets tweeted. In the event of a tie, print both filenames along with the number of tweets (Note: a file may be empty).
The five earliest tweets along with the tweeter.
sorted_tweets.txt: the lines from the inputted files sorted in reverse chronological order (most recent tweets first and earliest tweets at the end).
Program Details
File Format
Each input file will contain a list of records with one record appearing on each line of the file. The format of a record is as follows:
@TWEETER "TWEET" YEAR MONTH DAY HR:MN:SC
Your job will be to read in each file and for each line in the file, create a record with the above information. In the above format, a tweet is a string that can contain a list of tokens. Also, HR:MN:SC should be treated as a single field of the record, the time.
Note: you should remove the "@" symbol from each tweeter's name.
Reading from Files
You may use the provided Scanner class in the scanner.py module to help you parse different fields from the tweets.
Functions to Define
In addition to a main() function, define the following functions in your code:
read_records(): a function that given a filename creates a Scanner object and creates a record for each line in the file and returns a list containing the records
create_record(): a function that takes in a Scanner object and creates a record then returns a list representing the record; note, the "@" symbol should also be removed from the tweeter's name
is_more_recent(): a function that compares two records based on date and returns True if the first record is more recent than the second and False otherwise
merge_and_sort_tweets(): a function that merges two lists of records based placing more recent records before earlier records and returns the merged records as a single list
write_records(): a function that takes in a list of records and writes to the file output each record on it's own line.
Example Run
File 1 (tweet1_demo.txt):
@poptardsarefamous "Sometimes I wonder 2 == b or !(2 == b)" 2013 10 1 13:46:42 @nohw4me "i have no idea what my cs prof is saying" 2013 10 1 12:07:14 @pythondiva "My memory is great <3 64GB android" 2013 10 1 10:36:11 @enigma "im so clever, my code is even unreadable to me!" 2013 10 1 09:27:00
File 2 (tweet2_demo.txt):
@ocd_programmer "140 character limit? so i cant write my variable names" 2013 10 1 13:18:01 @caffeine4life "BBBBZZZZzzzzzZZZZZZZzzzZZzzZzzZzTTTTttt" 2011 10 2 02:53:47
Run the program: python twitter_sort.py tweet1_demo.txt tweet2_demo.txt sorted_demo.txt
Example Console Output
Reading files... tweet1_demo.txt contained the most tweets with 4. Merging files... Writing file... File written. Displaying 5 earliest tweeters and tweets. caffeine4life "BBBBZZZZzzzzzZZZZZZZzzzZZzzZzzZzTTTTttt" enigma "im so clever, my code is even unreadable to me!" pythondiva "My memory is great <3 64GB android" nohw4me "i have no idea what my cs prof is saying" ocd_programmer "140 character limit? so i cant write my variable names"
Example Output File (sorted_demo.txt)
poptardsarefamous "Sometimes I wonder 2 == b or !(2 == b)" 2013 10 1 13:46:42 ocd_programmer "140 character limit? so i cant write my variable names" 2013 10 1 13:18:01 nohw4me "i have no idea what my cs prof is saying" 2013 10 1 12:07:14 pythondiva "My memory is great <3 64GB android" 2013 10 1 10:36:11 enigma "im so clever, my code is even unreadable to me!" 2013 10 1 09:27:00 caffeine4life "BBBBZZZZzzzzzZZZZZZZzzzZZzzZzzZzTTTTttt" 2011 10 2 02:53:47
Bonus (6 pts)
(3 pts) Use dictionaries to keep track of the number of times each has tag appears in the two input files. Hashtags are common tokens in social media that start with a "#" and are followed by a string of words (such as "#thisisahashtag"). Print the most common hashtag.
(3 pts) Try to figure out how many tweets go over the 140 character limit set by Twitter. Figure out how many tweets are "short" tweets with character ranges under 50 characters long. Keep track of all the character lengths for every tweet and at the end report the average character length for a tweet.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started