Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Using your knowledge of loops and String variables, you will create a program that calculates and creates a report of some basic statistics and genomic

Using your knowledge of loops and String variables, you will create a program that calculates and creates a report of some basic statistics and genomic quantities on DNA base-pair sequences. The sequences are entered in from the console.

Draw a flow chart of your main program (not just main method). Submit the image file of a scan of this flow chart:

The chart may be handwritten and can be more than one page. Please make it legible.

Your image file can be PDF, JPG, TIF, or PNG.

Please submit only one image file via Canvas. Do not email your instructor your work.

Name your file Flowchart.jpg (or whatever is the extension for the graphic format you're using).

Note, if you are finding your flow chart to be too complicated, it probably means that you aren't using enough methods to help you. Remember you can treat the methods called by your main program as black-boxes, as far as the flow chart is concerned. That is to say, just write your call to the method in a rectangle and that's it; you don't have to chart the contents of the method.

Write the program described below. Submit the single Java file (i.e., ".java" text file) that contains your program:

Please submit only a single Java file via Canvas (i.e., for those who know about such things already, there should not be more than one class). Do not email your instructor your work.

Name your Java file HW_Loops.java, You must name your file this way.

Recall that this means the name of the public class in your code file must be named HW_Loops.

Your program will be in the form of a main method that has lines of code and calls other methods that you have written. There must be a minimum of two other methods beyond the main method. (Of course, you can have too many methods, but this seldom, if ever, has happened in this assignment.) Again, if you're finding you're not writing helper methods, you're doing the assignment incorrectly.

Genomics

DNA is the fundamental encoding of the instructions that govern the operation of living cells and, by extension, biological organisms. You can think of DNA as a storage medium in which the program that executes within all of your cells is written. The "machine code" of DNA, corresponding to the byte-code of Java, consists of only four nucleotides: four amino acids that are arranged in a linear sequence along the DNA molecule. These four bases are: guanine (G), adenine (A), thymine (T), and cytosine (C). So, a DNA molecule can be represented as a string made up of those four letters. The science of bioinformatics is largely concerned with computations on such genetic strings, or sequences. There are a variety of computations that one might perform on genetic sequences. We will investigate two types: basic statistics of individual sequences and pairwise alignments used to compare pairs of sequences.

Basic Statistics

Your program will first prompt the user to enter a single DNA sequence, which it should validate for legality (i.e., only the four valid bases) you might do this validation by writing a function that takes a String as a parameter and returns a boolean. Re-prompt the user if the input was invalid. Once you have a valid input, compute the following statistics (each should be implemented as a separate function, called from main()).

Count the number of occurrences of "C".

Determine the fraction of cytosine and guanine nucleotides. For example, if half of the nucleotides in the sequence are either "C" or "G", the fraction should be 0.5.

A DNA strand is actually made up of pairs of bases in effect, two strands that are cross-linked together. These two strands are complementary: if you know one, you can always determine the other, or complement, because each nucleotide only pairs up with one other. In particular, "A" and "T" are complements, as are "C" and "G". So, for example, the complement of the sequence "AAGGTCT" would be "TTCCAGA". Compute the complement of the input sequence.

Simple Pairwise Alignments

During reproduction, DNA sequences from both parents are replicated and "mixed" to form the DNA of their offspring. This process is not 100% accurate, and errors, or mutations, creep into the genome. Sometimes, these mutations have no effect, sometimes they are immediately lethal and the offspring isn't viable, and sometimes they result in changes in characteristics that may make the offspring more competitive when it comes time for it to breed (or may make it more competitive if there is an environmental change). This mutation process is one element that underlies evolution. A result of evolution is that, after the fact, you can compare two nucleotide sequences and test the hypothesis that they share an evolutionary history. Such comparison allows us to learn how modifications to DNA result in modifications of biochemical processes and physical characteristics. This is why sequence alignment techniques are important. We determine an alignment by comparing two sequences and seeing how well they match. A very simple method for this comparison is to look at corresponding nucleotides and compute a score for that potential alignment. If there are multiple potential alignments, then the one with the highest score would be considered most likely. For example, let's say that the two input sequences are "AATCTATA" and "AAGATA". There are three possible alignments:

AATCTATA AAGATA 

and:

AATCTATA AAGATA 

and:

AATCTATA AAGATA 

In general, mutations can be a substitution of one nucleotide for another (for example, a "G" being replaced by a "T"), an insertion that adds one or more nucleotides, or a deletion that deletes one or more nucleotides. To keep things simple, we will concern ourselves only with the first of these three: point mutations. For simple, gap-free alignments, we compute a score using a simple rule: if the two corresponding characters match, we add a match score of one (1); if they don't match, the match score is zero (0). The total score for an alignment is the sum of the character scores, and the alignment with the highest score is the best match. So, for example, the scores for the three alignments above are 4, 1, and 3, and the best alignment is the first one. You will use this simple alignment method in your program.

Program Description

Your program will prompt - via the console - for the first sequence, check it for validity, and compute its basic statistics. Then prompt, and validate, user input of a second sequence. It will compute that second sequence's basic statistics, too. Then, your program will compute the scores for all possible alignments for those two strings (you will want to have a method that takes two strings, plus an offset for shifting the shorter string relative to the longer one, and returns an int score) and determine the best alignment score. Finally, it will print out a report of the results. Thus, for the two input sequences "AATCTATA" and "AAGATA", your program will output the following report:

Sequence 1: AATCTATA C-count: 1 CG-ratio: 0.125 Complement: TTAGATAT Sequence 2: AAGATA C-count: 0 CG-ratio: 0.167 Complement: TTCTAT Best alignment score: 4 AATCTATA AAGATA 

Note how the report visually displays the match that produces the best alignment score. In this case it happened to be the alignment where the first letters are matched, but if the best alignment score happened to be in a the case where the bottom sequence was shifted, the report would need to show that.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Technology And Management Computers And Information Processing Systems For Business

Authors: Robert C. Goldstein

1st Edition

0471887374, 978-0471887379

Students also viewed these Databases questions