Question
This lab involves processing data from genome files. Your program should function with the given input file. (ecoli.txt). Copy this file to the same directory
This lab involves processing data from genome files. Your program should function with the given input file. (ecoli.txt). Copy this file to the same directory as you will do your code.
Background
DNA carries genetic information for cellular life forms and some viruses. DNA consists of long chains of chemical compounds called nucleotides. Four nucleotides are present in DNA: Adenine(A), Cytosine(C), Guanine(G) and Thymine(T). Certain regions of DNA are called genes. Most genes encode instructions for building proteins, These proteins are responsible for carrying out most of the life processes of an organism.
Nucleotides in genes are organized into codons. Codons are groups of 3 nucleotides. The sequences of DNA that encode proteins occur between a start codon(we will assume to be ATG) and a stop codon(one of the following: TAA, TAG, or TGA).
Problem statement
You are to read in the input and produce output shown below(to an output file that will be attached to the lab)
Here is the output that should be written to a file - NOTICE THE FORMAT. Try to replicate
Region Name: cure for cancer protein Nucleotides: ATGCCACTATGGTAG Nuc. Counts: [4, 3, 4, 4] Total Mass%: [27.3, 16.8, 30.6, 25.3] of 1978.8
Region Name: captain picard hair growth protein Nucleotides: ATGCCAACATGGATGCCCGATATGGATTGA Nuc. Counts: [9, 6, 8, 7] Total Mass%: [30.7, 16.8, 30.5, 22.1] of 3967.5 1.Region Name: bogus protein 2.Nucleotides: CCATT-AATGATCA-CAGTT 3.Nuc. Counts: [6, 4, 2, 6] 4.Total Mass%: [32.3, 17.7, 12.1, 29.9] of 2508.1
Output meaning:
row 1.region Name - read in and written to output file
row 2.Nucleotides - read in from input and written out to output- change to all uppercase
row 3.Nuc Counts - array representing count of A,C,G,T in that order from string
row 4.Total Mass %- A,C,G,T and - have different masses. Sum the mass for each type. Get overall total then find the percentages of total for A,C,G,T
Adenine (A): 135.128
b. Cytosine (C): 111.103
c. Guanine (G): 151.128
d. Thymine (T): 125.107
e. Junk (-): 100.000
Steps for DNA class
Define static variables
final static double massA=135.128
final static double massC =111.103
final static double massG =151.128
final static double massT = 125.107
final static double massJunk=100.000
Define instance variables
String title;
String nucleo;// fixed to uppercase
String nucleoFixed // fixed to uppercase and - removed
//countNucleo[0]= #As, countNucleo[1]=#Cs, etc
int [] countNucleo ={0,0,0,0};
//nucleoMass[0]= % of total mass As contribute etc
double[] nucleoMass={0,0,0,0}
double totalMass=0
Build constructor
Just need a no argument( many of the instance variables are initialized above)
Only need to set String variables = null
4. write setTitle and getTitle method public void setTitle(String t)
5. write setNucleo and getNucleo method - put to uppercase in setter
public void setNucleo(String t)
6. write setFixedNucleo - look up String method that replaces(removes) - from String
public void setFixedNucleo(String t)
7. write toString that just prints out above info in the correct format(will add to as we go on)
Steps for Driver
8. imports of driver
import java.io.*;
import java.util.Scanner;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.PrintWriter;
import java.io.FileOutputStream;
import java.util.ArrayList;
9. Variables needed to be declared in driver
String title;
String nucleo;
ArrayList
int count =0 // will be the variable you use to add to ArrayList
10. Set up the input and output files using the code from Basketball as a guide
11. Set up while loop to read the input
while(inputStream.HasNextLine()){
12. Read title from input
title= inputStream.nextLine();//reads the entire line into String
13. Read in nucleotide string from input
nucleo=inputStream.nextLine();// reads all the way to the blank
14. add new DNA object to ArrayList
input.add(new DNA());
15. call setNucleo on object
input.get(count).setNucleo(nucleo)
16. call setFixedNucleo on object
17. Call toString on object
System.out.println(input.get(count));
18. increment counter so you can keep track of what object is in the ArrayList
count++
19. End while loop
20. close inputStream and outputStream
21. Run so far - should see output file
DNA Class
22. do method to count A,C,T,G
public void setCountNucleo()
23. Set up a loop that goes through the nucleoFixed String
for(int i=0; i 24. Check to see each character is equal to A,C,G,T if(nucleoFixed.charAt(i)==A) countNucleo[0]++; .. etc 25. Do method setMassPercents - use the original since you need - for(int i=0;i if (nucleo.charAt(i)==A){ nucleoMass[0]= nucleoMass[0] + AMass; totalMass = totalMass + AMass; }...... go through other letters else if(nucleo.charAt(i)==-) totalMass = totalMass + junkMass }//end for loop Go through the nucleoMass array and update the values to be percent of total for(int i=0;i<4;i++) nucleoMass[i]=nucleoMass[i]/totalMass * 100 26. Add to the toString to print the 3rd and 4th line Driver 27 . add to your Driver inside the while loop call to setNuceloCount method input.get(count).setNucleoCount() 28. next add a call to setMassPercents method input.get(count).setMassPercents(); 29. Print out the object again. System.out.println(input.get(count)) New Class Called InvalidDNAException 30. Write an exception class called InvalidDNAException(see the example from Tuesday- this is a separate class in the same directory) public class InvalidDNAException extends Exception Include a constructor similar to the example from InvalidAreaCode exception I emailed you DNAClass 31. Modify the setMassPercent method to have an else at the end to test if the char is not A,C,G,T,- then it throws this exception. try { for(int i=0;i 34.Use this input to test exception 35. Comment at the bottom of the driver the results from step 31 and 332 No need to send input files. Example output from initial input file: Region Name: the operon leader peptide Nucleotides: ATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGA Nuc. Counts: [21, 22, 12, 11] Total Mass%: [33.5, 28.9, 21.4, 16.2] of 8471.7 Region Name: aspartokinase I/homoserine dehydrogenase I Nucleotides: ATGCGAGTGTTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGGGTTGCCGATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCATCTGGTAGCGATGATTGAAAAAACCATTAGCGGTCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTCTGACGGGACTCGCCGCCGCCCAGCCGGGATTTCCGCTGGCACAATTGAAAACTTTCGTCGACCAGGAATTTGCCCAAATAAAACATGTCCTGCATGGCATCAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCGTGTTAGAAGCGCGTGGTCACAACGTTACCGTTATCGATCCGGTCGAAAAACTGCTGGCAGTGGGTCATTACCTCGAATCTACCGTTGATATTGCTGAATCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTGACCACATGGTGCTGATGGCTGGTTTCACTGCCGGTAATGAAAAAGGCGAGCTGGTGGTTCTGGGACGCAACGGTTCCGACTACTCCGCTGCGGTGCTGGCGGCCTGTTTACGCGCCGATTGTTGCGAGATCTGGACGGATGTTGACGGTGTTTATACCTGCGATCCGCGTCAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTATCAGGAAGCGATGGAGCTTTCTTACTTCGGCGCTAAAGTTCTTCACCCCCGCACCATTACCCCCATCGCCCAGTTCCAGATCCCTTGCCTGATTAAAAATACCGGAAATCCCCAAGCACCAGGTACGCTCATTGGTGCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGCATTTCCAATCTGAATAACATGGCAATGTTCAGCGTTTCCGGCCCGGGGATGAAAGGGATGGTTGGCATGGCGGCGCGCGTCTTTGCAGCGATGTCACGCGCCCGTATTTCCGTGGTGCTGATTACGCAATCATCTTCCGAATACAGTATCAGTTTCTGCGTTCCGCAAAGCGACTGTGTGCGAGCTGAACGGGCAATGCAGGAAGAGTTCTACCTGGAACTGAAAGAAGGCTTACTGGAGCCGTTGGCGGTGACGGAACGGCTGGCCATTATCTCGGTGGTAGGTGATGGTATGCGCACCTTACGTGGGATCTCGGCGAAATTCTTTGCCGCGCTGGCCCGCGCCAATATCAACATTGTCGCCATTGCTCAGGGATCTTCTGAACGCTCAATCTCTGTCGTGGTCAATAACGATGATGCGACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTTTGTGATTGGCGTCGGTGGCGTTGGCGGTGCGCTGCTGGAGCAACTGAAGCGTCAGCAAAGCTGGTTGAAGAATAAACATATCGACTTACGTGTCTGCGGTGTTGCTAACTCGAAGGCACTGCTCACCAATGTACATGGCCTTAATCTGGAAAACTGGCAGGAAGAACTGGCGCAAGCCAAAGAGCCGTTTAATCTCGGGCGCTTAATTCGCCTCGTGAAAGAATATCATCTGCTGAACCCGGTCATTGTTGACTGTACTTCCAGCCAGGCTGTGGCAGATCAATATGCCGACTTCCTGCGCGAAGGTTTCCACGTTGTTACGCCGAACAAAAAGGCCAACACCTCGTCGATGGATTACTACCATCAGTTGCGTTATGCGGCGGAAAAATCGCGGCGTAAATTCCTCTATGACACCAACGTTGGGGCTGGATTACCGGTTATTGAGAACCTGCAAAATCTGCTCAATGCTGGTGATGAATTGATGAAGTTCTCCGGCATTCTTTCAGGTTCGCTTTCTTATATCTTCGGCAAGTTAGACGAAGGCATGAGTTTCTCCGAGGCGACCACACTGGCGCGGGAAATGGGTTATACCGAACCGGACCCGCGAGATGATCTTTCTGGTATGGATGTGGCGCGTAAGCTATTGATTCTCGCTCGTGAAACGGGACGTGAACTGGAGCTGGCGGATATTGAAATTGAACCTGTGCTGCCCGCAGAGTTTAACGCCGAGGGTGATGTCGCCGCTTTTATGGCGAATCTGTCACAGCTCGACGATCTCTTTGCCGCGCGTGTGGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATATTGATGAAGATGGCGTCTGCCGCGTGAAGATTGCCGAAGTGGATGGTAATGATCCGCTGTTCAAAGTGAAAAATGGCGAAAACGCCCTGGCCTTCTATAGCCACTATTATCAGCCGCTGCCGTTGGTACTGCGCGGATATGGTGCGGGCAATGACGTTACAGCTGCCGGTGTCTTTGCTGATCTGCTACGTACCCTCTCATGGAAGTTAGGAGTCTGA Nuc. Counts: [551, 608, 692, 612] Total Mass%: [23.0, 20.9, 32.4, 23.7] of 323152.2
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started