Answered step by step
Verified Expert Solution
Question
1 Approved Answer
The Human Genome Project is an international effort that successfully completed the sequencing of the human genome in 2001 and the complete sequence of the
The Human Genome Project is an international effort that successfully completed the sequencing of the human genome in 2001 and the complete sequence of the 3 billion DNA subunits (called bases) was published in 2003. The results of this project, i.e. the DNA sequences are publicly disseminated in an effort to lower the barriers to effective biomedical research. The Biotechnology Program at the UH College of Technology needs your assistance in analyzing DNA sequences. In terms of computer terminology, DNA is a base-4 labelin DNA bases of adenine (A), thymine (T), cytosine (C), and guanine (G). This sequence is determined using automated technology and typically errors in reading a base are typically denoted by the letter N. Your assignment is to read a file containing a DNA sequence and determine g system. This labeling system uses the letters 'a', 't, 'c, and 'g', for the four r of bases in the sequence 1) the total nu 2) Number of errors (3) the total number of A bases (4) the total number of G bases (5) the total number of T bases (6) the total number of C bases (7) You also need to generate a graph of the distribution of the 4 bases in the sequence. Since DNA sequence files are large with thousands of bases, for generating the graph divide your base counts by 100 so facilitate plotting of a reasonable number of bases. Note, the values will be truncated in graph (e.g. as seen below, the total count for base A is 953, but the graph shows 9 A's (which is 953/100.0 9.53 truncated to 9) Your program should output a report to an output file with the information listed above and shown below Sample input and output file: SAMPLE INPUT FILE SAMPLE OUTPUT FILE DNAreport1.txt DNAse nce1.txt (only a portion of the file is shown atgagtattc aacatttecg tgtcgccctt attccctttt ttgcggcatt ttgccttcct gtttttgctc acccagaaac gctggtgaaa gtaaaagatg ctgaagatca gttgggtgca cgagtgggtt acatcgaact ggatctcaac agcggtaaga tccttgagag ttttcgcccc gaagaacgtt ttccaatgat gagcactttt aaagttctgc tatgtggcgc ggtattatcc cgtgttgacg cc99gcaaga gcaactcggt cgccgcatac actattctca gaatgacttg gttgagtact caccagtcac agaaaagcat cttacggatg gcatgacagt aagagaatta Total number of bases: 5522 gcagtgctg ccataaccat gagtgataac actgcggcca acttacttct gacaacgatc Number of errors ggaggaccga aggagctaac cgcttttttg cacaacatgg gggatcatgt aactcgcctt gatcgttggg aaccggagct gaatgaagcc ataccaaacg acgagcgtga caccacgatg cctgcagcaa tggcaacaac gttgcgcaaa ctattaactg gcgaactact tactctagct otal number of G. 1699 tcccggcaac aattaataga ctggatggag gcggataaag ttgcaggacc acttctgcgc Total number of T: 1307 tggccctte cggctggctg gtttattgct gataaatctg gagccggtga gcgtgggtct Total number of C: 1563 cgcggtatca ttgcagcact ggggccagat ggtaagccct cccgtatcgt agttatctac acgacgggga gtcaggcaac tatggatgaa cgaaatagac agatcgctga gataggtgcc ataaattctt attttgacac tcaccaaaat agtcacctgg aaaacccgct ttttgtgaca Graph of Base Distribution aagtacagaa ggcttggtca catttaaatc actgagaact agagagaaat actatcgcaa actgtaatag acattacatc cataaaagtt tecccagtcc ttattgtaat attgcacagt gcaattgcta catggcaaac tagtgtagca tagaagtcaa agcaaaaaca aaccaaagaa EPORT on DNA Sequence Total number of bases with errors: 5525 Total number of A: 953 Program Requirements 1) The input should be read from a file. Note that the bases in the input files are in lowercase letters, a, g, t, and c, and an error is denoted by n. Your program should check for input file failure and end with an appropriate message if an error occurs in reading the file. Hint: Since you do not know the total number of bases (letters) in each file, you need to use a while loop to read the data from the file. You can do this in two ways l. Use the read statement in the while condition as follows: if your ifstream variable is infile, then use while(infile>> variableName) {llenter code here OR II. Make use of the eof) function as follows while(! infile.eof)) if (infile >> variableName) llenter code here) ) The eof) function tells you when you have finished reading the last line of data in the file. You have to use an additional if statement as shown above because eof ( returns true only when you try to read data after you have reached the end. For example, suppose that you have a file that has only has 1 line of data. The eof flag will not set when you read that line, but later when try to read again in the next iteration. infile.eof0) is false starting from the first line in the input file until the last line. It is false even after you finish reading data on the last line. It becomes true only when you try to read data again after you have finished reading the last line (2) The output should be written to file. Format the output to be neatly aligned in columns as shown above. For both the report of the number of bases and the graph, you must use the manipulators, setw, setfill, left, and right, etc. (3) Use of switch statement is recommended (for example when counting the individual bases of type a,?, g, and t' and errors (4) Remember to close all files when done
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started