Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This has to been done in commands. Using only standard Linux commands, generate a histogram of all three-word sequences in the EEG Report database provided

image text in transcribedimage text in transcribed

This has to been done in commands.

Using only standard Linux commands, generate a histogram of all three-word sequences in the EEG Report database provided (see /data/courses/ece_3822/current/eeg_reports). We refer to these sequences as trigrams. Your output should list these sequences in decreasing order of occurrence. Compute the number of occurrences (essentially a histogram), the percentage of time a trigram occurs (the number of occurrences /the total number of trigrams) and a cumulative distribution (which isa useful representation because it shows how many trigrams are needed to cover 80% of the data) 1. Note that your trigram counter should be case-insensitive and ignore punctuation. For example suppose you have two text files, file1.txt and file2.txt. These files contain the following text: file1.txt: See Jane run. See file2.txt See jane rn Se- John run The trigrams present in this data are see 1ane rn jane run see run see john see john run see 1ane rn jane run see run see jane see jane run The output of your command line should be Fr Trigram see 1ane rn ane run see un see fohn see 1ohn run n see fane No. Percentage Cumulative 31 37.500090| 37.500090 25.0000901 62.5000% 12.5000901 75.000090 12.500090| 87.5000% 12.5000901 100.000090 Trigrams should be counted even when text is split across lines. However, you do not need to deal with beginning or end of file boundaries (edge effects) Since the list of trigrams you compute for the entire database will be very long, abbreviate your list to show: (1) the 10 most frequently occurring trigrams, (2) the trigrams that occur at the 25%, 50% and 75% percentiles, and the 10 least frequently occurring trigrams The output of your code MUST contain the columns above but does not need to contain the vertical or horizontal lines. In your document, you can insert the data into an MS Word table 2. Demonstrate that you can run the command you construct for task no. 1 from within a shellscript Create a shellscript called compute_trigrams.sh, insert your command into the file, set the permissions and other properties correctly, run it and demonstrate that it gives the proper output. This shellscript be run as shown below, must take a root directory as input, search all files below that directory and produce the same output as in task no. 1. It should be run as follows: compute_trigrams.sh /datalcourses/ece_3822/current/eeg_reports This shellscript should run on any popular version of Linux and run on a machine other than the AWS server. To test this, we will copy your script to our local Linux cluster and run it there. This is your first exposure to the issue of portability currenteeg reports Using only standard Linux commands, generate a histogram of all three-word sequences in the EEG Report database provided (see /data/courses/ece_3822/current/eeg_reports). We refer to these sequences as trigrams. Your output should list these sequences in decreasing order of occurrence. Compute the number of occurrences (essentially a histogram), the percentage of time a trigram occurs (the number of occurrences /the total number of trigrams) and a cumulative distribution (which isa useful representation because it shows how many trigrams are needed to cover 80% of the data) 1. Note that your trigram counter should be case-insensitive and ignore punctuation. For example suppose you have two text files, file1.txt and file2.txt. These files contain the following text: file1.txt: See Jane run. See file2.txt See jane rn Se- John run The trigrams present in this data are see 1ane rn jane run see run see john see john run see 1ane rn jane run see run see jane see jane run The output of your command line should be Fr Trigram see 1ane rn ane run see un see fohn see 1ohn run n see fane No. Percentage Cumulative 31 37.500090| 37.500090 25.0000901 62.5000% 12.5000901 75.000090 12.500090| 87.5000% 12.5000901 100.000090 Trigrams should be counted even when text is split across lines. However, you do not need to deal with beginning or end of file boundaries (edge effects) Since the list of trigrams you compute for the entire database will be very long, abbreviate your list to show: (1) the 10 most frequently occurring trigrams, (2) the trigrams that occur at the 25%, 50% and 75% percentiles, and the 10 least frequently occurring trigrams The output of your code MUST contain the columns above but does not need to contain the vertical or horizontal lines. In your document, you can insert the data into an MS Word table 2. Demonstrate that you can run the command you construct for task no. 1 from within a shellscript Create a shellscript called compute_trigrams.sh, insert your command into the file, set the permissions and other properties correctly, run it and demonstrate that it gives the proper output. This shellscript be run as shown below, must take a root directory as input, search all files below that directory and produce the same output as in task no. 1. It should be run as follows: compute_trigrams.sh /datalcourses/ece_3822/current/eeg_reports This shellscript should run on any popular version of Linux and run on a machine other than the AWS server. To test this, we will copy your script to our local Linux cluster and run it there. This is your first exposure to the issue of portability currenteeg reports

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

50 Tips And Tricks For MongoDB Developers Get The Most Out Of Your Database

Authors: Kristina Chodorow

1st Edition

1449304613, 978-1449304614

More Books

Students also viewed these Databases questions