Question: CSC 110 - Homework 9 - Bioinformatics We have seen the code in class that finds all alignments between a pair of sequences by adding
CSC 110 - Homework 9 - Bioinformatics
We have seen the code in class that finds all alignments between a pair of sequences by adding gaps to the shorter sequence. You can find a version of that code on the Sample Code page.
Of course, in a real bioinformatics application, gaps could occur in both sequences. For this assignment, you will modify the all_alignments program so that it allows the user to specify how many gaps to consider in the longer sequence. The program will find the optimal alignments between two sequences when a specified number of gaps is allowed in the longer sequence. For example, suppose we have the following two sequences:
ACT
GACG
If we allow for one gap in the longer sequence, then there are 5 possible sequences that result from adding that gap:
-GACG G-ACG GA-CG GAC-G GACG-
Then for each of these sequences, we would align it with the shorter sequence with 2 gaps, so that both sequences are the same length. There are 10 ways to add 2 gaps to ACT:
--ACT -A-CT -AC-T -ACT- A--CT
A-C-T A-CT- AC--T AC-T- ACT--
So the total number of alignments to consider in this example would be 5*10=50.
For this assignment, you will write a function:
scoreGapsInBoth(longSeq, shortSeq, longGaps):
totalAlignments, highScore, highList
that computes the score for all alignments when given the two sequences, longSeq and shortSeq, and the number of gaps to allow in the long sequence (longGaps). From these scores, you will find the highest, and keep a list of all alignments with that score. The pseudocode for the function is as follows:
Find all sequences that result from adding the specified number of gaps to the longSeq
Find all sequences that result from adding enough gaps the the shorter sequence so that it is the same length as the longer sequence
For each of these long sequences with gaps
For each of the short sequences with gaps
Compute the score between the two sequences
Keep track of the highest score and the list of sequences with that score
The function will return the total number of alignments, the high score, and the list of alignments with the high score.
You will also need to write a function
scoreAlignment(seq1, seq2):
return score
That computes the score between the given sequences where we assume:
Gap = 0, Match = 1, Mismatch = -1
Skeleton code for this assignment can be found here:


HINTS:
-
You will not need to make any changes to the functions in the given program. Your scoreGapsInBoth function will call the insertAllGaps function to compute the lists of sequences.
-
When you are storing the alignments with the optimal score, you will use a list of alignments. You can store each alignment as a concatenation of the two sequences in the alignment. For instance, if we have:
seq1 = AC-T- and seq2 = -GTCA
then we can store the alignment as:
alignment = seq1 + + seq2
This will store the alignment as a single string, but will print with a line break so the two sequences are aligned under each other.
-
The main function is written for you, so all you have to write is the two functions described above. Write the scoreAlignment function first and test that it works correctly before writing the scoreGapsInBoth function.
Your program should NOT call the main function. Gradescope is expecting that the program will not execute until it calls the main function itself.
The output of your program should look like this:


***PLEASE MAKE SURE OUTPUT LOOKS IDENTICAL TO THE EXAMPLES, THANK YOU!!!***
+ Program to find all possible alignments of a pair of strings when adding + gaps to the shorter string + Given: Two strings of nucleotides + Find: All possible alignments when adding gaps to the shorter string + Punction to create a list of all ways one gap can be inserted into + a string. The input is a string, the output is a list of strings + with a gap inserted into all positions of the input string def insertoneGap (strng): alignments - 1 for i in range (len (strng)): newstrng - steng[0:1] + '-' + stengti:len (strng)] alignments - alignments + [newStrng) alignments - alignment + [strng + return alignments + Function to take a set union of a pair of lists + This is used to eliminate any duplicates in the list when they are combined def Union (listi, list2): for a in list2: if a not in listi: listl - listi + [al return listi Punction to create all possible alignments of a string with a certain number of gaps inserted def insertallGaps (strng, gaps): +List of alignments starts with the initial string alignments - (strng) Loop to insert one gap at a time for i in range (gaps): Initialize list of new alignments with i gaps in the string newAlignments - U For every string in the list of alignments for st in alignments: Insert one gap in each alignment in the list al - insertoneGap (st) + Add the new alignment to the list of new alignments being created newalignments - Union (newalignments, al) The alignments list now becomes the new alignments list to add another gap to each of the alignments in the new list now alignments - newAlignments return alignments Function to compute the score for a pair of alignments + Gap score - 0, Match score - 1, Mismatch score - -1 def scoreAlignment (seq., seq2): + Pill in the code to compute the score for a pair of sequences + using the scoring in the comment above return total score + Function to compute all alignments with gaps in both sequences def scoreGaps InBoth (longSeq, shortseg, longGaps): + Fill in the code to compute the total number of alignments when we + allow longgaps number of gaps in the long sequence, longgaps is then + and the optimal score with a list of all alignments with that score return totalAlignments, highscore, highList Function to print all of the alignments along with the score for each alignment def printResults (totalAlignments, highScore, highList): print("There are ", totalAlignments," alignments.") print("There are ", len (highList)," optimal alignments with a score of ", highScore, "are: ") for i in range (len (highlist)): print (highList[1]) print() return Main function def main(): Get the two strings to align stri - input ("Enter string 1: ") str2 - input ("Enter string 2: ") longgaps - int (input("How many total gaps should we allow in the longer sequence? ")) + Compute alignments adding gaps to the shorter string if len (stri) > len (str2): longSeq - stri shortSeq - str2 else: longSeq - str2 shortSeq - strl totalAlignments, highscore, highlist - scoreGaps InBoth (longSeq, shortSeq, longgaps) print Results (totalAlignments, highScore, highlist) >>> main() Enter string 1: ACT Enter string 2: GACG How many total gaps should we allow in the longer sequence? 1 There are 50 alignments. There are 2 optimal alignments with a score of 2 : GAC-G -ACT- GACG- -AC-T >>> main() Enter string 1: TACCTAGTCGGT Enter string 2: TACCTAGTAGGTC How many total gaps should we allow in the longer sequence? 2 There are 47775 alignments. There are 30 optimal alignments with a score of 11 : -TACCTAGT-AGGTC -TACCTAGTC-GGT- -TACCTAGTA-GGTC -TACCTAGT-CGGT- T-ACCTAGT-AGGTC T-ACCTAGTC-GGT- T-ACCTAGTA-GGTC T-ACCTAGT-CGGT- TA-CCTAGT-AGGTC TA-CCTAGTC-GGT- TA-CCTAGTA-GGTC TA-CCTAGT-CGGT- TAC-CTAGT-AGGTC TAC-CTAGTC-GGT- TAC-CTAGTA-GGTC TAC-CTAGT-CGGT- TACC-TAGT-AGGTC TACC-TAGTC-GGT- TACC-TAGTA-GGTC TACC-TAGT-CGGT
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
