Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

HI I need the problem to be a solution of the second proble. but as you will see they are related. I will provide my

image text in transcribed

image text in transcribed

image text in transcribed

HI I need the problem to be a solution of the second proble. but as you will see they are related. I will provide my failed solution and frome there you can understand where I fail.

the following is my failed attempt

image text in transcribed

The purpose of this project is to write a program to compute the optimal sequence alignment of two DNA strings. This program will introduce you to the emerging field of computational biology in which computers are used to do research orn biological systems. Further, you will be introduced to a powerful algorithmic design paradigm known as dynamic programming Biology Review A genetic sequence is a string formed from a four-letter alphabet Adenine (A). Thymine (T), Guanine (G), Cytosine (C) of biological macromolecules referred to together as the DNA bases. A gene is a genetic sequence that contains the information needed to construct a protein. All of your genes taken together are referred to as the human genome, a blueprint for the parts needed to construct the proteins that form your cel a copy of the genome. This copying process, as well as natural wear and tear, introduces a small number of changes into the sequences of many genes. Among the of a substring of bases; such changes are generally referred to as point mutations. As a result of these point mutations, the saime gene ls. Each new cell produced by your body receives most common cha nges are the substitution of one base for another and the deletion uenced from closely related organisms will have slight differences. The Problem Through your research you have found the following sequence of a gene in a previously unstudied organism. A AC A G T TAC C What is the function of the protein that this gene encodes? You could begin a series of uninformed experiments in the lab to determine what role this gene plays. However, there is a good chance that it is a variant of a known gene in a previously studied organism. Since biologists and computer scientists have laboriously determined (and published) the genetic sequence of many organisms (including humans), you would like to leverage this information to your advantage. We'll compare the above genetic sequence with one which has already been sequenced and whose function is well understood TA AGG T CA If the two quantify "similar enough." genetic sequences are similar enough, we might expect them to have similar functions. We would like a way to Edit Distance In this assignment we will measure the similarity of two genetic sequences by their edit distance, a concept first introduced in the context of coding theory, but which is now widely used in spell checking, speech recognition, plagiarism detection, file revisioning, and computational linguistics. We align the two sequences, but we are permitted to insert gaps in either sequence (eg, to make them have the same length). We pay a penalty for each gap that we insert and also for each pair of characters that mismatch in the final alignment. Intuitively, these penalties model the relative likeliness of point mutations arising from deletion/insertion and substitution. We produce a numerical score according to the following table, which is widely used in biological applications operation nsert a gap align two characters that mismatch1 align two characters that match cost 0 Here are two possible alignments of the strings AACAGTTACC' and y - 'TAAGCTCA' x y cost x y cost A T1 A A 0 C A 1 AGI G G 0 T T0 A T1 A A 0 G G 0 T G1 T T 0 A A 0 C 2 C C 0 C A 1 8 7 Problem 1. (Calculating Edit Distance Using Dynamic Progrmming) A direct implementation of the above recursive scheine will work, but it spectacularly inefficient. I both input strings have N characters, then the nurnber of recursive calls will exceod 2N. To overcome this performance bug, we use dynamic programming. Dynamic programming is a powerful algorithmic paradigm, fist ntroduced by Bellman in the context of operations research, and then applied to the alignment of biological sequences by Needleman and Wunsch. Dynamic programming now plays the leading role in many computational problems, including control theory, financial engineering, and bioinformatics, including BLAST (the sequence alignment program almost universally used by molecular biologists in their experimental work). The key idea of dynamic programming is to break up a large computational problem into smaller subproblems, store the answers to those smller subproblems, and, eventually, use the stored answers to solve the original problem. This avoids recomputing the same quantity over and over again. Instead of using recursion, use a nested loop that calculates opt[] j] in the right order so that opt1]j 1], opt[i11, and opti]1 are all computed before we try to compute opt[] Write a program edit distance.py that reads strings x and y from standard input and computes the edit-distance matrix opt as described above. The program should output x, y, the dimensions (number of rows and columns) of opt, and opt itself, using the following format The first and second ls should contain the strings x and y The third line should contain the dimensions of the opt matrix, separated by a space The following lines should contain the rows of the opt matrix, each ending in a newline character. Use stdio.writef) with the format string ,%3d, to write out the elements of the matrix python3 edit distance py

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Fundamentals Of Database Systems

Authors: Ramez Elmasri, Sham Navathe

4th Edition

0321122267, 978-0321122261

More Books

Students also viewed these Databases questions