Question

1 Approved Answer

Posted on Sep 05, 2024

Last week for homework you wrote a program that allowed for DNA sequences to be searched for protein coding regions. This week you will enhance

Last week for homework you wrote a program that allowed for DNA sequences to be searched for protein coding regions. This week you will enhance your program to make it able to report back not just the protein coding regions, but also the mRNA that DNA codes for and also the amino acid sequence that DNA results in.

In completing this assignment you must process the DNA strand first, getting all of the protein coding regions and then do all of your manipulation. The user should also have a choice as to whether they get the protein coding region printed as DNA, mRNA, or an amino acid sequence.

Your program should start off by asking for the .fasta file just like last week. It should then ask for how the user wants the output. If the user types a 1, they want DNA. If the user types a 2, they want mRNA. If the user types 3, they want the amino acid sequence. All other options should be considered invalid and the user should be asked for a new choice. After that, your program must parse the .fasta file just like last week, but you MUST store each protein coding region in an ArrayList until after the parsing is done at which point you will output the regions in the format the user has picked. I heavily recommend that you create a ProteinCodingRegion class which takes in the string representing the coding region as an argument and has methods to represent the region as mRNA and an amino acid sequence. The format of your output should be the same as in Homework 7. The only thing that will change is whether you print DNA, mRNA, or the amino acid sequence.

To convert to mRNA, simply take a protein coding region and make the following substitutions: A becomes U (for Uracil) T becomes A C becomes G G becomes C

THESE ARE THE TWO FILES PROVIDED

1.) Starter

import java.io.File; import java.io.FileNotFoundException; import java.util.Scanner; public class Homework8Starter { /** * Function main begins with program execution * * @param args Command line arguments (not used here) */ public static void main(String[] args) { Scanner scan = new Scanner(System.in); System.out.println( "Please enter the full path to the fasta file." ); String filepath = scan.nextLine(); File dnaFile = new File( filepath ); Scanner fileScan = null; //Attempt to open a connection to the file try { fileScan = new Scanner( dnaFile ); } catch ( FileNotFoundException fnfe ) { System.out.println( "The file could not be found, shutting down" ); System.exit( 1 ); } //This is just more memory efficient, if you just used a String don't worry that is acceptable too StringBuilder dna = new StringBuilder(); while( fileScan.hasNextLine() ) { String line = fileScan.nextLine().trim(); //If we encounter the >, then we are at a new strand if( line.length() > 0 && line.charAt( 0 ) == '>' ) { DNAStrand strand = new DNAStrand( dna.toString() ); strand.findPCRs(); System.out.println(); System.out.println( line ); dna = new StringBuilder(); } //If we get a line with one character ignore it else if( line.length() == 1 ) { //do nothing } //Otherwise the line is part of the DNA strand else { dna.append( line ); } } //There will be one last DNA strand that is not printed out, this will print it out DNAStrand strand = new DNAStrand( dna.toString() ); strand.findPCRs(); } }

2.) DNAStrand

public class DNAStrand { private String dna; //The dna strand represented as text /** * One argument constructor, initializes the class with the dna string * * @param dna The dna strand the class is representing */ public DNAStrand( String dna ) { this.dna = dna; this.scrubDNA(); //Need to handle situation where a DNA strand does not evenly slice if( this.dna.length() % 3 != 0 ) { if( this.dna.length() % 3 == 1 ) { this.dna += "AA"; } else if( this.dna.length() %3 == 2 ) { this.dna += "A"; } } } /** * Cleans up the DNA, removing X's and N's * * (Note: I forgot to mention the Ns in the last spec...if you left them in you will not loose points) */ private void scrubDNA() { this.dna = this.dna.replaceAll( "X", "A" ); this.dna = this.dna.replaceAll( "N", "A" ); } /** * Finds the protein coding regions and prints them to the screen */ public void findPCRs() { boolean inSeq = false; for( int i = 0; i < this.dna.length(); i += 3 ) { char c1 = this.dna.charAt( i ); char c2 = this.dna.charAt( i + 1 ); char c3 = this.dna.charAt( i + 2 ); String codon = new Character( c1 ).toString() + c2 + c3; if( codon.equals( "ATG" ) ) { System.out.print( codon ); inSeq = true; } else if( inSeq && ( codon.equals( "TAA" ) || codon.equals( "TAG" ) || codon.equals( "TGA" ) ) ) { System.out.println( codon ); System.out.println(); inSeq = false; } else if( inSeq ) { System.out.print( codon ); } } /* Need to shut a sequence if we finish the parse and we are still in a sequence. Most likely biologists would not agree with just sticking a TGA on the end of an incomplete coding region, but I am going to do it anyway. */ if( inSeq ) { System.out.println( "TGA" ); } } }