Question
DNA sequence alignment Recall the Edit Distance (Sequence Alignment) problem: given two strings over the same alphabet and mismatch and gap penalties, nd an alignment
DNA sequence alignment Recall the Edit Distance (Sequence Alignment) problem: given two strings over the same alphabet and mismatch and gap penalties, nd an alignment of minimal cost. One of the most common uses of the minimum edit distance algorithm is in computational biology. DNA sequences are composed of four amino-acids, denoted by the letters A, C, T, G. Mutation over the course of evolution changes the sequences by deleting, inserting, or substituting amino-acids. The smaller the edit distance between some two sequences, the smaller the evolutionary distance between them. Thus, biological sequences are often aligned to minimize the edit distance between them. (a) Suppose the costs of mismatches and gaps are not the same for all the letters in the DNA sequences (since some mutations are more common than others). The following table states the cost of each operation for each letter: A C T G - A 0 .1 .1 .2 .1 C .2 0 .2 .3 .1 T .2 .1 0 .1 .2 G .1 .2 .2 0 .1 - .2 .3 .1 In the substitution table, the entry in row A, column C, is the penalty AC for A to C mismatch (but not vise versa) and the entry in row A, column -, is the cost of aligning A atop a gap. Find the minimum edit distance AND the optimal alignment between the following sequences (show the matrix of your calculations): Sequence 1 (top): G A T T A C A Sequence 2 (bottom):A T T A A C (b) Extra credit: 2 points Suppose there are occasional errors in sequencing and some of the letters in the DNA sequence are represented by a ' ?' as a result. The ' ?' character can be matched to any letter in the alignment but not to a gap. i. Modify the Sequence Alignment algorithm to account for the ' ?' charac- ter. Give the recurrence relation and the polynomial dynamic programming algorithm. ii. Assume the mismatch and gap costs are as in the previous question. Find the minimum edit distance AND the optimal alignment between the following sequences (show the matrix of your calculations): Sequence 1: G A T ? T A C A Sequence 2: A T T A C ?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started