Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Write a program(IN PYTHON) that merges and sorts two twitter feeds. At a high level, your program is going to perform the following: Read in

Write a program(IN PYTHON) that merges and sorts two twitter feeds. At a high level, your program is going to perform the following:

Read in two files containing twitter feeds.

Merge the twitter feeds in reverse chronological order (most recent first).

Write the merged feeds to an output file.

Provide some basic summary information about the files.

The names of the files will be passed in to your program via command line arguments. Use the following input files to test your program: tweet1.txt and tweet2.txt

The output of your program includes the following:

Console

The name of the file that contained the most tweets followed by the number of tweets tweeted. In the event of a tie, print both filenames along with the number of tweets (Note: a file may be empty).

The five earliest tweets along with the tweeter.

sorted_tweets.txt: the lines from the inputted files sorted in reverse chronological order (most recent tweets first and earliest tweets at the end).

Program Details

File Format

Each input file will contain a list of records with one record appearing on each line of the file. The format of a record is as follows:

@TWEETER "TWEET" YEAR MONTH DAY HR:MN:SC

Your job will be to read in each file and for each line in the file, create a record with the above information. In the above format, a tweet is a string that can contain a list of tokens. Also, HR:MN:SC should be treated as a single field of the record, the time.

Note: you should remove the "@" symbol from each tweeter's name.

Reading from Files

You may use the provided Scanner class in the scanner.py module to help you parse different fields from the tweets.

Functions to Define

In addition to a main() function, define the following functions in your code:

read_records(): a function that given a filename creates a Scanner object and creates a record for each line in the file and returns a list containing the records

create_record(): a function that takes in a Scanner object and creates a record then returns a list representing the record; note, the "@" symbol should also be removed from the tweeter's name

is_more_recent(): a function that compares two records based on date and returns True if the first record is more recent than the second and False otherwise

merge_and_sort_tweets(): a function that merges two lists of records based placing more recent records before earlier records and returns the merged records as a single list

write_records(): a function that takes in a list of records and writes to the file output each record on it's own line.

Example Run

File 1 (tweet1_demo.txt):

 @poptardsarefamous "Sometimes I wonder 2 == b or !(2 == b)" 2013 10 1 13:46:42 @nohw4me "i have no idea what my cs prof is saying" 2013 10 1 12:07:14 @pythondiva "My memory is great <3 64GB android" 2013 10 1 10:36:11 @enigma "im so clever, my code is even unreadable to me!" 2013 10 1 09:27:00 

File 2 (tweet2_demo.txt):

 @ocd_programmer "140 character limit? so i cant write my variable names" 2013 10 1 13:18:01 @caffeine4life "BBBBZZZZzzzzzZZZZZZZzzzZZzzZzzZzTTTTttt" 2011 10 2 02:53:47 

Run the program: python twitter_sort.py tweet1_demo.txt tweet2_demo.txt sorted_demo.txt

Example Console Output

 Reading files... tweet1_demo.txt contained the most tweets with 4. Merging files... Writing file... File written. Displaying 5 earliest tweeters and tweets. caffeine4life "BBBBZZZZzzzzzZZZZZZZzzzZZzzZzzZzTTTTttt" enigma "im so clever, my code is even unreadable to me!" pythondiva "My memory is great <3 64GB android" nohw4me "i have no idea what my cs prof is saying" ocd_programmer "140 character limit? so i cant write my variable names" 

Example Output File (sorted_demo.txt)

 poptardsarefamous "Sometimes I wonder 2 == b or !(2 == b)" 2013 10 1 13:46:42 ocd_programmer "140 character limit? so i cant write my variable names" 2013 10 1 13:18:01 nohw4me "i have no idea what my cs prof is saying" 2013 10 1 12:07:14 pythondiva "My memory is great <3 64GB android" 2013 10 1 10:36:11 enigma "im so clever, my code is even unreadable to me!" 2013 10 1 09:27:00 caffeine4life "BBBBZZZZzzzzzZZZZZZZzzzZZzzZzzZzTTTTttt" 2011 10 2 02:53:47 

Bonus (6 pts)

(3 pts) Use dictionaries to keep track of the number of times each has tag appears in the two input files. Hashtags are common tokens in social media that start with a "#" and are followed by a string of words (such as "#thisisahashtag"). Print the most common hashtag.

(3 pts) Try to figure out how many tweets go over the 140 character limit set by Twitter. Figure out how many tweets are "short" tweets with character ranges under 50 characters long. Keep track of all the character lengths for every tweet and at the end report the average character length for a tweet.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Databases And Information Systems 14th East European Conference Adbis 2010 Novi Sad Serbia September 2010 Proceedings Lncs 6295

Authors: Barbara Catania ,Mirjana Ivanovic ,Bernhard Thalheim

2010th Edition

3642155758, 978-3642155758

More Books

Students also viewed these Databases questions