Question

1 Approved Answer

Posted on Sep 25, 2024

Please use C++ to solve this and not any other language. The Human Genome Project is an international effort to discover all of the approximate

*Please use C++ to solve this and not any other language.*

image text in transcribed

The Human Genome Project is an international effort to discover all of the approximate 30,000-100,000 human genes in the human genome i.e. the total DNA). The project successfully completed the sequencing of the human genome in 2001 and the complete sequence of the 3 billion DNA subunits (called bases) was published in 2003. The results of this project, i.e. the DNA sequences are publicly disseminated in an effort to lower the barriers to effective biomedical research. The Biotechnology Program at the UH College of Technology needs your assistance in analyzing DNA sequences. In terms of computer terminology, DNA is a base-4 labeling system. This labeling system uses the letters A, T, C, and G, for the four DNA bases of adenine (A), thymine (T), cytosine (C), and guanine (G). This sequence is determined using automated technology and typically errors in reading a base are typically denoted by the letter N. Your assignment is to read a file containing a DNA sequence and determine (2) hu nobar ovemroers of bases in the sequence 2 Number of errors (3) the total number of A bases (4) the total number of G bases (5) the total number of T bases (6) the total number of C bases (7) You also need to generate a graph of the distribution of the 4 bases in the sequence. Since DNA sequence files are large with thousands of bases, for generating the graph divide your base counts by 100 so facilitate plotting of a reasonable number of bases. Note, the values will be truncated in graph (e.g. as seen below, the total count for base A is 1176, but the graph shows 11 A's (which is 1176/100.0- 1176 truncated to 11) Your program should output a report to an output file with the information listed above and shown below ample input and ut file DNAreport1.txt DNAsequence1.txt (only a portion of the file is shown) atgagtatte aacatttceg tgtegcectt attccctttt ttgcggcatt ttgeett REPORT on DNA Sequence gtttttgcte acccagaaac gctggtgaaa gtaaaagatg ctgaagatca gttgggtgca gagtg99tt acatcgaact ggatctcaac agcggtaaga teettgagag ttttegeccC Total nusber of bases:5524 gaagaacgtt ttccaatgst gagcactttt aaagttetg tatgtggcgc g9tattatcc Number of errors gtgttog ccg99caags gcaactoggt cqccgcatac actattctca gaatgactt Total nutber of A: ttgagtact caccagtcsc agaaaagcat cttacggatg gcatgacagt aagagaatta Total nutber of G: 953 1699 tacttct gacaacgatc Total nunber of T tgcagtgctg ccataaccat gagtgatasc actgcggcca act ggagga0ga aggagctasc cgcttteetg cacaacatgg 9ggatcatgt aactcgcctt gategtto aaccggagct gaatgaagce atacca8a0g acgagcgtga caccacgatg cctgcagcaa tggcaacaac gttgcgcaaa ctattaactg gkgaactact tactctagct tcccggcaac aattaataga ctggatg9a9 gcggataaag ttgcaggacc acttetgcgc tcggccctte cggetg99ctg gtttattgct gataaatctg gagccggtga gcgtgggtct cgcggtatca ttgcagcact ggggccagat ggtaagccct cccgtatcgt agttatctac cgac9999a gtcaggcae tatggatgss cgaaatagae agatcgctga gataggtgcc ataaattett attttgacac tcaccaasst agtcacctoo aaaacccgct ttttgtgaca aagtacagaa ggcttggtca catttaaatc actgagaac actgtaatag acattacate cataaaagtt tccccagtec ttattgtaat attgcacagt gcaattgeta catggcaaac tagtgtagca tagaagtcaa agcaaa8aca aaccaaagaa Total nuber of C: 1566 Graph of Base Distribution t agagagaaat actatcgcas CI CCCccccccccCCCc Program Requirements (1) The input should be read from a file. Note that the bases in the input files are in lowercase letters, a, g, t, and c, and an error is denoted by n. Your program should check for input file failure and end with an appropriate message if an error occurs in reading the file. Make use of the eof() function when reading since you do not know how many bases are there in each file. You can use the eof() function to determine when you have finished reading the last line of data in the file When you use eof), remember that eof returns true only when you try to read data after you have reached the end For example, suppose that you have a file that has only has 1 line of data. The eof flag will not set when you read that line, but later when try to read again in the next iteration. Example usage, if you have a ifstream variable called infile, then you can call the function as follows: infile.eof). infile.eof) is false starting from the first line in the input file until the last line. It is false even after you finish reading data on the last line. It becomes true only when you try to read data again after you have finished reading the last line. Three different input files (DNAsequence1.txt, DNAsequence2.txt, and DNAsequence3.txt) are provided for you to test your code (2) The output should be written to file. Format the output to be neatly aligned in columns as shown above. For both the (3) Use of switch statement is recommended (for example when counting the individual bases of typeae, .g. and t' and (4) Remember to close all files when done report of the number of bases and the graph, you must use the manipulators, setw, setill, left, and right, etc. errors