Question
A DNA string is a sequence of the letters a, c, g, and t in any order. For example, aacgtttgtaaccag is a DNA string of
A DNA string is a sequence of the letters a, c, g, and t in any order. For example, aacgtttgtaaccag is a DNA string of length 15. Each sequence of three consecutive letters is called a codon. For example, in the preceding string, the codons are aac, gtt, tgt, aac, and cag. If we ignored the rst letter and started listing the codons starting at the second a, the codons would be acg, ttt, gta, and acc, and we would ignore the last ag. In this exercise, for simplicity, we will assume that we always start reading the codons at the rst letter of the string. A DNA string can be hundreds of thousands of codons long, even millions of codons long, which means that it is infeasible to count them by hand. It would be useful to have a simple script that could count the number of occurrences of a speci c codon in such a string.
For instance, for the example string above such a script would tell us that aac occurs three times and tgt occurs once. Your job is to write a script named countcodons that expects two arguments on the command line. The rst argument is a three letter codon string such as aaa or cgt. The second argument is the pathname of a le containing a valid DNA string with no newline characters or white space characters of any kind within it. This le contains nothing but a sequence of the letters a, c, g, and t. If your script is given two valid arguments, it will output a single number, which is the number of occurrences of the codon given as argument 1 in the le given as argument 2. If it nds no occurrences, it should output 0.
For example, if the string aacgtttgtaaccagaac is in a le named dnafile, then your script should work like this: $ countcodons ttt dnafile 1 $ countcodons aac dnafile 3 $ countcodons ccc dnafile 0 Warning: if it is given valid arguments, the script is not to output anything but a number. No fancy messages, no words - just a number! The script should check that it has two arguments and if it does not, it should print a how-to-use-me and then exit. It is not required to check that the le is in the proper form, or that the string is actually a codon.
However, for (+3 extra credit), it should print an error message and exit if the le cannot be opened or if it is not a le containing only the four letters, a, c, g, and t. It must do both to receive the credit. Hint: You will not be able to solve this problem using the grep command alone. There are a number of commands that might be useful, such as sort, cut, fold, and uniq. One of these commands is the key that makes this task easy to solve. Find out which one it is and use it.
Just the code.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started