Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Python PA1: Tweets Write a program ( `twitter_sort.py` ) that merges and sorts two twitter feeds. At a high level, your program is going to

Python PA1: Tweets

Write a program (`twitter_sort.py`) that merges and sorts two twitter feeds. At a high level, your program is going to perform the following:

Read in two files containing twitter feeds that are sorted in reverse chronological order (most recent first).

Merge the twitter feeds in reverse chronological order (most recent first).

Write the merged feeds to an output file.

Provide some basic summary information about the files.

The names of the files will be passed in to your program via command line arguments. Use the attached input files to test your program: tweet1.txt and tweet2.txt

The output of your program includes the following:

Console:

The name of the file that contained the most tweets followed by the number of tweets tweeted. In the event of a tie, print both filenames along with the number of tweets (Note: a file may be empty).

The five earliest tweets along with the tweeter.

`merged_tweets.txt`: the lines from the merged inputted files. The Standard version output has merged tweets sorted in reverse chronological order (newest tweets at the start and oldest tweets at the end).

Program Details

File Format

Each input file will contain a list of records with one record appearing on each line of the file. The format of a record is as follows:

`@TWEETER "TWEET" YEAR MONTH DAY HR:MN:SC`

Your job will be to read in each file and for each line in the file, create a record with the above information. In the above format, a tweet is a string that can contain a list of tokens. Also, HR:MN:SC should be treated as a single field of the record, the time.

Parsing Tweets

Use Python's `re` (regular expression) module to parse different fields from the tweets.

Functions to Define

Define the following functions in your code:

`main()`: the main function that drives the program. The function interprets command line arguments for filenames, calls functions described below, and outputs results to the console

`read_tweets()`: a function that given a filename creates a list of records (say, records_list) containing the tweets read from the file. Each record in the records_list should be a dictionary containing 'tweeter', 'tweet', 'year', 'month' 'day' and 'time' information. Note that these fields map to the information containted in each line of the file.

`merge_tweets()`: a function that merges two lists of records (say,, records_list_1 and records_list_2) and returns the merged list of records. For Minimal version, the merged lists are in no particular order. For Standard version, the merged tweets are sorted in reverse chronological order (see Note below).

`write_tweets()`: a function that takes in a list of records and writes to the file output each record on it's own line.

Note: Since we have not covered sorting algorithms yet, the Standard version can assume that the original files contain tweets sorted in reverse chronological order.

Example Run

File 1 (tweet1_demo.txt):

@poptardsarefamous "Sometimes I wonder 2 == b or !(2 == b)" 2013 10 1 13:46:42 @nohw4me "i have no idea what my cs prof is saying" 2013 10 1 12:07:14 @pythondiva "My memory is great <3 64GB android" 2013 10 1 10:36:11 @enigma "im so clever, my code is even unreadable to me!" 2013 10 1 09:27:00

File 2 (tweet2_demo.txt):

@ocd_programmer "140 character limit? so i cant write my variable names" 2013 10 1 13:18:01 @caffeine4life "BBBBZZZZzzzzzZZZZZZZzzzZZzzZzzZzTTTTttt" 2011 10 2 02:53:47

Run the program: `python twitter_sort.py tweet1_demo.txt tweet2_demo.txt sorted_demo.txt`

Example Console Output

Reading files... tweet1_demo.txt contained the most tweets with 4. Merging files... Files Merged. Writing file... File written. Displaying 5 newest tweeters and tweets. @poptardsarefamous "Sometimes I wonder 2 == b or !(2 == b)" @ocd_programmer "140 character limit? so i cant write my variable names" @nohw4me "i have no idea what my cs prof is saying" @pythondiva "My memory is great <3 64GB android" @enigma "im so clever, my code is even unreadable to me!"

Example Output File (sorted_demo.txt)

@poptardsarefamous "Sometimes I wonder 2 == b or !(2 == b)" 2013 10 1 13:46:42 @ocd_programmer "140 character limit? so i cant write my variable names" 2013 10 1 13:18:01 @nohw4me "i have no idea what my cs prof is saying" 2013 10 1 12:07:14 @pythondiva "My memory is great <3 64GB android" 2013 10 1 10:36:11 @enigma "im so clever, my code is even unreadable to me!" 2013 10 1 09:27:00 @caffeine4life "BBBBZZZZzzzzzZZZZZZZzzzZZzzZzzZzTTTTttt" 2011 10 2 02:53:47 

Bonus (5 pts)

Use dictionaries to keep track of the number of times each has tag appears in the two input files. Hashtags are common tokens in social media that start with a "#" and are followed by a string of words (such as "#thisisahashtag"). Print the most common hashtag.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions