Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The uniq command-line utility has been standard to Unix-based operating systems for a long time. On GNU/Linux, uniq was written by Richard Stallman (AKA Saint

The uniq command-line utility has been standard to Unix-based operating systems for a long time. On GNU/Linux, uniq was written by Richard Stallman (AKA Saint IGNUcius) and David MacKenzie. uniq (by default) prints only the unique lines in its input. uniq also asssumes that its input is already sorted such that unique lines are grouped together. One of the common ways to run uniq is with the -c option, which adds a count of how many times each line appeared: grep -Po '[^\s]+' /srv/datasets/shakespeare-othello.txt | \ tr '[:upper:]' '[:lower:]' | \ sed -E 's/(^[^A-Za-z0-9])|([^A-Za-z0-9]+$)//g' | \ sort | \ uniq -c Note that this is one long command, escaped (with backslashes) to be formatted over multiple lines, and consists of multiple piped commands: grep isolates all whitespace-delimited tokens from Shakespeare's Othello, one word per line tr makes all uppercase letters lowercase sed trims any non-alphanumeric characters from the ends of lines sort sorts all lines alphanumerically uniq summarizes the unique lines and how many times each occurs You will find that the last 10 lines of output from this command are: 1 yonders 6 yong 476 you 2 you'l 6 you'le 4 young 225 your 2 you're 6 yours 5 youth Assignment You shall write a program in Java that replicates the behavior and output of uniq -c. That is, your program shall: Expect input from standard input, consisting of any number of lines of text. Any duplicate lines are assumed to be sequential. Print each unique line of input, prefixed by the number of occurrences of that line. For testing purposes, compare your program's output with uniq -c's. Try the following commands. Substituting your program in place of uniq -c should produce the same output: # Nucleic acids in human chromosome 11: fold -w 1 /srv/datasets/chromosome11 | sort | uniq -c # 1 million digits of pi: fold -w 1 /srv/datasets/pi1000000 | sort | uniq -c # Taxonomic ranks: cut -f 4 /srv/datasets/taxonomy.tab | sort | uniq -c # Many years worth of baby names in the US: cut -d , -f 2 /srv/datasets/baby_names_national.csv | sort | uniq -c # Letter frequency histogram in the KJV tr -dc '[:alpha:]' < /srv/datasets/king-james.txt | tr '[:upper:]' '[:lower:]'

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Spatial And Temporal Databases 11th International Symposium Sstd 2009 Aalborg Denmark July 8 10 2009 Proceedings Lncs 5644

Authors: Nikos Mamoulis ,Thomas Seidl ,Kristian Torp ,Ira Assent

2009th Edition

3642029817, 978-3642029813

More Books

Students also viewed these Databases questions