Question
Python 3 Only Problem 2. (Recovering the Alignment) Now that we know how to compute the edit distance between two strings, we next want to
Python 3 Only
Problem 2. (Recovering the Alignment) Now that we know how to compute the edit distance between two strings, we next want to recover the optimal alignment itself. The key idea is to retrace the steps of the dynamic programming algorithm backwards, re-discovering the path of choices (highlighted in red in the table above) from opt[0][0] to opt[M][N]. To determine the choice that led to opt[i][j], we consider the three possibilities: 1. The optimal alignment matches x[i] up with a gap. In this case, we must have opt[i][j] = opt[i + 1][j] + 2. 2. The optimal alignment matches y[j] up with a gap. In this case, we must have opt[i][j] = opt[i][j + 1] + 2. 3. The optimal alignment matches x[i] up with y[j]. In this case, we must have opt[i][j] = opt[i + 1][j + 1] if x[i] equals y[j], or opt[i][j] = opt[i + 1][j + 1] + 1 otherwise.
Write a program alignment.py that reads from standard input, the output produced by edit_distance.py, ie, input strings x and y, and the opt matrix. The program should then recover an optimal alignment using the procedure described above, and write to standard output the edit distance between x and y and the alignment itself, using the following format:
The rst line should contain the edit distance, preceded by the text Edit distance = .
Each subsequent line should contain a character from the rst string, followed by the paired character from the second string, followed by the associated penalty. Use the character - to indicate a gap in either string. $ python3 edit_distance.py < data/example10.txt | python3 alignment.py
Edit distance = 7
A T 1
A A 0
C - 2
A A 0
G G 0
T G 1
T T 0
A - 2
C C 0
C A 1 Data The data directory contains short test data les and actual genomic data les. Be sure to test your programs thoroughly using the short test les and the longer actual data les. Here are the optimal edit distances of several of the supplied les: ecoli2500.txt 118
ecoli5000.txt 160
fli8.txt 6 fli9.txt 4
fli10.txt 2
ftsa1272.txt 758
gene57.txt 8
stx1230.txt 521
stx19.txt 10
stx26.txt 17
stx27.txt 19
and here are the test cases with unique optimal alignments:
$ python3 edit_distance.py < data/endgaps7.txt | python3 alignment.py
Edit distance = 4
a - 2
t t 0
a a 0
t t 0
t t 0
a a 0
t t 0
- a 2
$ python3 edit_distance.py < data/fli10.txt | python3 alignment.py
Edit distance = 2
T T 0
G G 0
G G 0
C T 1
G G 0
G G 0
A T 1
A A 0
C C 0
T T 0
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started