Question
The uniq command-line utility has been standard to Unix-based operating systems for a long time. On GNU/Linux, uniq was written by Richard Stallman (AKA Saint
The uniq command-line utility has been standard to Unix-based operating systems for a long time. On GNU/Linux, uniq was written by Richard Stallman (AKA Saint IGNUcius) and David MacKenzie. uniq (by default) prints only the unique lines in its input. uniq also asssumes that its input is already sorted such that unique lines are grouped together. One of the common ways to run uniq is with the -c option, which adds a count of how many times each line appeared: grep -Po '[^\s]+' /srv/datasets/shakespeare-othello.txt | \ tr '[:upper:]' '[:lower:]' | \ sed -E 's/(^[^A-Za-z0-9])|([^A-Za-z0-9]+$)//g' | \ sort | \ uniq -c Note that this is one long command, escaped (with backslashes) to be formatted over multiple lines, and consists of multiple piped commands: grep isolates all whitespace-delimited tokens from Shakespeare's Othello, one word per line tr makes all uppercase letters lowercase sed trims any non-alphanumeric characters from the ends of lines sort sorts all lines alphanumerically uniq summarizes the unique lines and how many times each occurs You will find that the last 10 lines of output from this command are: 1 yonders 6 yong 476 you 2 you'l 6 you'le 4 young 225 your 2 you're 6 yours 5 youth Assignment You shall write a program in Java that replicates the behavior and output of uniq -c. That is, your program shall: Expect input from standard input, consisting of any number of lines of text. Any duplicate lines are assumed to be sequential. Print each unique line of input, prefixed by the number of occurrences of that line. For testing purposes, compare your program's output with uniq -c's. Try the following commands. Substituting your program in place of uniq -c should produce the same output: # Nucleic acids in human chromosome 11: fold -w 1 /srv/datasets/chromosome11 | sort | uniq -c # 1 million digits of pi: fold -w 1 /srv/datasets/pi1000000 | sort | uniq -c # Taxonomic ranks: cut -f 4 /srv/datasets/taxonomy.tab | sort | uniq -c # Many years worth of baby names in the US: cut -d , -f 2 /srv/datasets/baby_names_national.csv | sort | uniq -c # Letter frequency histogram in the KJV tr -dc '[:alpha:]' < /srv/datasets/king-james.txt | tr '[:upper:]' '[:lower:]'
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started