Question: CSC 110 - Homework 9 - Bioinformatics We have seen the code in class that finds all alignments between a pair of sequences by adding

CSC 110 - Homework 9 - Bioinformatics

We have seen the code in class that finds all alignments between a pair of sequences by adding gaps to the shorter sequence. You can find a version of that code on the Sample Code page.

Of course, in a real bioinformatics application, gaps could occur in both sequences. For this assignment, you will modify the all_alignments program so that it allows the user to specify how many gaps to consider in the longer sequence. The program will find the optimal alignments between two sequences when a specified number of gaps is allowed in the longer sequence. For example, suppose we have the following two sequences:

ACT

GACG

If we allow for one gap in the longer sequence, then there are 5 possible sequences that result from adding that gap:

-GACG G-ACG GA-CG GAC-G GACG-

Then for each of these sequences, we would align it with the shorter sequence with 2 gaps, so that both sequences are the same length. There are 10 ways to add 2 gaps to ACT:

--ACT -A-CT -AC-T -ACT- A--CT

A-C-T A-CT- AC--T AC-T- ACT--

So the total number of alignments to consider in this example would be 5*10=50.

For this assignment, you will write a function:

scoreGapsInBoth(longSeq, shortSeq, longGaps):

totalAlignments, highScore, highList

that computes the score for all alignments when given the two sequences, longSeq and shortSeq, and the number of gaps to allow in the long sequence (longGaps). From these scores, you will find the highest, and keep a list of all alignments with that score. The pseudocode for the function is as follows:

Find all sequences that result from adding the specified number of gaps to the longSeq

Find all sequences that result from adding enough gaps the the shorter sequence so that it is the same length as the longer sequence

For each of these long sequences with gaps

For each of the short sequences with gaps

Compute the score between the two sequences

Keep track of the highest score and the list of sequences with that score

The function will return the total number of alignments, the high score, and the list of alignments with the high score.

You will also need to write a function

scoreAlignment(seq1, seq2):

return score

That computes the score between the given sequences where we assume:

Gap = 0, Match = 1, Mismatch = -1

Skeleton code for this assignment can be found here:

CSC 110 - Homework 9 - Bioinformatics We have seen the code

in class that finds all alignments between a pair of sequences by

HINTS:

  1. You will not need to make any changes to the functions in the given program. Your scoreGapsInBoth function will call the insertAllGaps function to compute the lists of sequences.

  2. When you are storing the alignments with the optimal score, you will use a list of alignments. You can store each alignment as a concatenation of the two sequences in the alignment. For instance, if we have:

seq1 = AC-T- and seq2 = -GTCA

then we can store the alignment as:

alignment = seq1 + + seq2

This will store the alignment as a single string, but will print with a line break so the two sequences are aligned under each other.

  1. The main function is written for you, so all you have to write is the two functions described above. Write the scoreAlignment function first and test that it works correctly before writing the scoreGapsInBoth function.

Your program should NOT call the main function. Gradescope is expecting that the program will not execute until it calls the main function itself.

The output of your program should look like this:

adding gaps to the shorter sequence. You can find a version of

that code on the Sample Code page. Of course, in a real

***PLEASE MAKE SURE OUTPUT LOOKS IDENTICAL TO THE EXAMPLES, THANK YOU!!!***

+ Program to find all possible alignments of a pair of strings when adding + gaps to the shorter string + Given: Two strings of nucleotides + Find: All possible alignments when adding gaps to the shorter string + Punction to create a list of all ways one gap can be inserted into + a string. The input is a string, the output is a list of strings + with a gap inserted into all positions of the input string def insertoneGap (strng): alignments - 1 for i in range (len (strng)): newstrng - steng[0:1] + '-' + stengti:len (strng)] alignments - alignments + [newStrng) alignments - alignment + [strng + return alignments + Function to take a set union of a pair of lists + This is used to eliminate any duplicates in the list when they are combined def Union (listi, list2): for a in list2: if a not in listi: listl - listi + [al return listi Punction to create all possible alignments of a string with a certain number of gaps inserted def insertallGaps (strng, gaps): +List of alignments starts with the initial string alignments - (strng) Loop to insert one gap at a time for i in range (gaps): Initialize list of new alignments with i gaps in the string newAlignments - U For every string in the list of alignments for st in alignments: Insert one gap in each alignment in the list al - insertoneGap (st) + Add the new alignment to the list of new alignments being created newalignments - Union (newalignments, al) The alignments list now becomes the new alignments list to add another gap to each of the alignments in the new list now alignments - newAlignments return alignments Function to compute the score for a pair of alignments + Gap score - 0, Match score - 1, Mismatch score - -1 def scoreAlignment (seq., seq2): + Pill in the code to compute the score for a pair of sequences + using the scoring in the comment above return total score + Function to compute all alignments with gaps in both sequences def scoreGaps InBoth (longSeq, shortseg, longGaps): + Fill in the code to compute the total number of alignments when we + allow longgaps number of gaps in the long sequence, longgaps is then + and the optimal score with a list of all alignments with that score return totalAlignments, highscore, highList Function to print all of the alignments along with the score for each alignment def printResults (totalAlignments, highScore, highList): print("There are ", totalAlignments," alignments.") print("There are ", len (highList)," optimal alignments with a score of ", highScore, "are: ") for i in range (len (highlist)): print (highList[1]) print() return Main function def main(): Get the two strings to align stri - input ("Enter string 1: ") str2 - input ("Enter string 2: ") longgaps - int (input("How many total gaps should we allow in the longer sequence? ")) + Compute alignments adding gaps to the shorter string if len (stri) > len (str2): longSeq - stri shortSeq - str2 else: longSeq - str2 shortSeq - strl totalAlignments, highscore, highlist - scoreGaps InBoth (longSeq, shortSeq, longgaps) print Results (totalAlignments, highScore, highlist) >>> main() Enter string 1: ACT Enter string 2: GACG How many total gaps should we allow in the longer sequence? 1 There are 50 alignments. There are 2 optimal alignments with a score of 2 : GAC-G -ACT- GACG- -AC-T >>> main() Enter string 1: TACCTAGTCGGT Enter string 2: TACCTAGTAGGTC How many total gaps should we allow in the longer sequence? 2 There are 47775 alignments. There are 30 optimal alignments with a score of 11 : -TACCTAGT-AGGTC -TACCTAGTC-GGT- -TACCTAGTA-GGTC -TACCTAGT-CGGT- T-ACCTAGT-AGGTC T-ACCTAGTC-GGT- T-ACCTAGTA-GGTC T-ACCTAGT-CGGT- TA-CCTAGT-AGGTC TA-CCTAGTC-GGT- TA-CCTAGTA-GGTC TA-CCTAGT-CGGT- TAC-CTAGT-AGGTC TAC-CTAGTC-GGT- TAC-CTAGTA-GGTC TAC-CTAGT-CGGT- TACC-TAGT-AGGTC TACC-TAGTC-GGT- TACC-TAGTA-GGTC TACC-TAGT-CGGT

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!