Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This lab involves processing data from genome files. Your program should function with the given input file. (ecoli.txt). Copy this file to the same directory

This lab involves processing data from genome files. Your program should function with the given input file. (ecoli.txt). Copy this file to the same directory as you will do your code.

Background

DNA carries genetic information for cellular life forms and some viruses. DNA consists of long chains of chemical compounds called nucleotides. Four nucleotides are present in DNA: Adenine(A), Cytosine(C), Guanine(G) and Thymine(T). Certain regions of DNA are called genes. Most genes encode instructions for building proteins, These proteins are responsible for carrying out most of the life processes of an organism.

Nucleotides in genes are organized into codons. Codons are groups of 3 nucleotides. The sequences of DNA that encode proteins occur between a start codon(we will assume to be ATG) and a stop codon(one of the following: TAA, TAG, or TGA).

Problem statement

You are to read in the input and produce output shown below(to an output file that will be attached to the lab)

Here is the output that should be written to a file - NOTICE THE FORMAT. Try to replicate

Region Name: cure for cancer protein Nucleotides: ATGCCACTATGGTAG Nuc. Counts: [4, 3, 4, 4] Total Mass%: [27.3, 16.8, 30.6, 25.3] of 1978.8

Region Name: captain picard hair growth protein Nucleotides: ATGCCAACATGGATGCCCGATATGGATTGA Nuc. Counts: [9, 6, 8, 7] Total Mass%: [30.7, 16.8, 30.5, 22.1] of 3967.5 1.Region Name: bogus protein 2.Nucleotides: CCATT-AATGATCA-CAGTT 3.Nuc. Counts: [6, 4, 2, 6] 4.Total Mass%: [32.3, 17.7, 12.1, 29.9] of 2508.1

Output meaning:

row 1.region Name - read in and written to output file

row 2.Nucleotides - read in from input and written out to output- change to all uppercase

row 3.Nuc Counts - array representing count of A,C,G,T in that order from string

row 4.Total Mass %- A,C,G,T and - have different masses. Sum the mass for each type. Get overall total then find the percentages of total for A,C,G,T

Adenine (A): 135.128

b. Cytosine (C): 111.103

c. Guanine (G): 151.128

d. Thymine (T): 125.107

e. Junk (-): 100.000

Steps for DNA class

Define static variables

final static double massA=135.128

final static double massC =111.103

final static double massG =151.128

final static double massT = 125.107

final static double massJunk=100.000

Define instance variables

String title;

String nucleo;// fixed to uppercase

String nucleoFixed // fixed to uppercase and - removed

//countNucleo[0]= #As, countNucleo[1]=#Cs, etc

int [] countNucleo ={0,0,0,0};

//nucleoMass[0]= % of total mass As contribute etc

double[] nucleoMass={0,0,0,0}

double totalMass=0

Build constructor

Just need a no argument( many of the instance variables are initialized above)

Only need to set String variables = null

4. write setTitle and getTitle method public void setTitle(String t)

5. write setNucleo and getNucleo method - put to uppercase in setter

public void setNucleo(String t)

6. write setFixedNucleo - look up String method that replaces(removes) - from String

public void setFixedNucleo(String t)

7. write toString that just prints out above info in the correct format(will add to as we go on)

Steps for Driver

8. imports of driver

import java.io.*;

import java.util.Scanner;

import java.io.FileInputStream;

import java.io.FileNotFoundException;

import java.io.PrintWriter;

import java.io.FileOutputStream;

import java.util.ArrayList;

9. Variables needed to be declared in driver

String title;

String nucleo;

ArrayList input=new ArrayList();

int count =0 // will be the variable you use to add to ArrayList

10. Set up the input and output files using the code from Basketball as a guide

11. Set up while loop to read the input

while(inputStream.HasNextLine()){

12. Read title from input

title= inputStream.nextLine();//reads the entire line into String

13. Read in nucleotide string from input

nucleo=inputStream.nextLine();// reads all the way to the blank

14. add new DNA object to ArrayList

input.add(new DNA());

15. call setNucleo on object

input.get(count).setNucleo(nucleo)

16. call setFixedNucleo on object

17. Call toString on object

System.out.println(input.get(count));

18. increment counter so you can keep track of what object is in the ArrayList

count++

19. End while loop

20. close inputStream and outputStream

21. Run so far - should see output file

DNA Class

22. do method to count A,C,T,G

public void setCountNucleo()

23. Set up a loop that goes through the nucleoFixed String

for(int i=0; i

24. Check to see each character is equal to A,C,G,T

if(nucleoFixed.charAt(i)==A)

countNucleo[0]++; .. etc

25. Do method setMassPercents - use the original since you need -

for(int i=0;i

if (nucleo.charAt(i)==A){

nucleoMass[0]= nucleoMass[0] + AMass;

totalMass = totalMass + AMass;

}...... go through other letters

else if(nucleo.charAt(i)==-)

totalMass = totalMass + junkMass

}//end for loop

Go through the nucleoMass array and update the values to be percent of total

for(int i=0;i<4;i++)

nucleoMass[i]=nucleoMass[i]/totalMass * 100

26. Add to the toString to print the 3rd and 4th line

Driver

27 . add to your Driver inside the while loop call to setNuceloCount method input.get(count).setNucleoCount()

28. next add a call to setMassPercents method

input.get(count).setMassPercents();

29. Print out the object again.

System.out.println(input.get(count))

New Class Called InvalidDNAException 30. Write an exception class called InvalidDNAException(see the example from Tuesday- this is a separate class in the same directory) public class InvalidDNAException extends Exception Include a constructor similar to the example from InvalidAreaCode exception I emailed you

DNAClass 31. Modify the setMassPercent method to have an else at the end to test if the char is not A,C,G,T,- then it throws this exception. try { for(int i=0;i

34.Use this input to test exception

35. Comment at the bottom of the driver the results from step 31 and 332 No need to send input files.

Example output from initial input file:

Region Name: the operon leader peptide

Nucleotides: ATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGA

Nuc. Counts: [21, 22, 12, 11]

Total Mass%: [33.5, 28.9, 21.4, 16.2] of 8471.7

Region Name: aspartokinase I/homoserine dehydrogenase I

Nucleotides: ATGCGAGTGTTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGGGTTGCCGATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCATCTGGTAGCGATGATTGAAAAAACCATTAGCGGTCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTCTGACGGGACTCGCCGCCGCCCAGCCGGGATTTCCGCTGGCACAATTGAAAACTTTCGTCGACCAGGAATTTGCCCAAATAAAACATGTCCTGCATGGCATCAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCGTGTTAGAAGCGCGTGGTCACAACGTTACCGTTATCGATCCGGTCGAAAAACTGCTGGCAGTGGGTCATTACCTCGAATCTACCGTTGATATTGCTGAATCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTGACCACATGGTGCTGATGGCTGGTTTCACTGCCGGTAATGAAAAAGGCGAGCTGGTGGTTCTGGGACGCAACGGTTCCGACTACTCCGCTGCGGTGCTGGCGGCCTGTTTACGCGCCGATTGTTGCGAGATCTGGACGGATGTTGACGGTGTTTATACCTGCGATCCGCGTCAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTATCAGGAAGCGATGGAGCTTTCTTACTTCGGCGCTAAAGTTCTTCACCCCCGCACCATTACCCCCATCGCCCAGTTCCAGATCCCTTGCCTGATTAAAAATACCGGAAATCCCCAAGCACCAGGTACGCTCATTGGTGCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGCATTTCCAATCTGAATAACATGGCAATGTTCAGCGTTTCCGGCCCGGGGATGAAAGGGATGGTTGGCATGGCGGCGCGCGTCTTTGCAGCGATGTCACGCGCCCGTATTTCCGTGGTGCTGATTACGCAATCATCTTCCGAATACAGTATCAGTTTCTGCGTTCCGCAAAGCGACTGTGTGCGAGCTGAACGGGCAATGCAGGAAGAGTTCTACCTGGAACTGAAAGAAGGCTTACTGGAGCCGTTGGCGGTGACGGAACGGCTGGCCATTATCTCGGTGGTAGGTGATGGTATGCGCACCTTACGTGGGATCTCGGCGAAATTCTTTGCCGCGCTGGCCCGCGCCAATATCAACATTGTCGCCATTGCTCAGGGATCTTCTGAACGCTCAATCTCTGTCGTGGTCAATAACGATGATGCGACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTTTGTGATTGGCGTCGGTGGCGTTGGCGGTGCGCTGCTGGAGCAACTGAAGCGTCAGCAAAGCTGGTTGAAGAATAAACATATCGACTTACGTGTCTGCGGTGTTGCTAACTCGAAGGCACTGCTCACCAATGTACATGGCCTTAATCTGGAAAACTGGCAGGAAGAACTGGCGCAAGCCAAAGAGCCGTTTAATCTCGGGCGCTTAATTCGCCTCGTGAAAGAATATCATCTGCTGAACCCGGTCATTGTTGACTGTACTTCCAGCCAGGCTGTGGCAGATCAATATGCCGACTTCCTGCGCGAAGGTTTCCACGTTGTTACGCCGAACAAAAAGGCCAACACCTCGTCGATGGATTACTACCATCAGTTGCGTTATGCGGCGGAAAAATCGCGGCGTAAATTCCTCTATGACACCAACGTTGGGGCTGGATTACCGGTTATTGAGAACCTGCAAAATCTGCTCAATGCTGGTGATGAATTGATGAAGTTCTCCGGCATTCTTTCAGGTTCGCTTTCTTATATCTTCGGCAAGTTAGACGAAGGCATGAGTTTCTCCGAGGCGACCACACTGGCGCGGGAAATGGGTTATACCGAACCGGACCCGCGAGATGATCTTTCTGGTATGGATGTGGCGCGTAAGCTATTGATTCTCGCTCGTGAAACGGGACGTGAACTGGAGCTGGCGGATATTGAAATTGAACCTGTGCTGCCCGCAGAGTTTAACGCCGAGGGTGATGTCGCCGCTTTTATGGCGAATCTGTCACAGCTCGACGATCTCTTTGCCGCGCGTGTGGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATATTGATGAAGATGGCGTCTGCCGCGTGAAGATTGCCGAAGTGGATGGTAATGATCCGCTGTTCAAAGTGAAAAATGGCGAAAACGCCCTGGCCTTCTATAGCCACTATTATCAGCCGCTGCCGTTGGTACTGCGCGGATATGGTGCGGGCAATGACGTTACAGCTGCCGGTGTCTTTGCTGATCTGCTACGTACCCTCTCATGGAAGTTAGGAGTCTGA

Nuc. Counts: [551, 608, 692, 612]

Total Mass%: [23.0, 20.9, 32.4, 23.7] of 323152.2

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Harness The Power Of Big Data The IBM Big Data Platform

Authors: Paul Zikopoulos, David Corrigan James Giles Thomas Deutsch Krishnan Parasuraman Dirk DeRoos Paul Zikopoulos

1st Edition

0071808183, 9780071808187

More Books

Students also viewed these Databases questions

Question

What is nonverbal communication?

Answered: 1 week ago