The DNA molecule is made up of two linked strands which together form a helix. The links between the strands are provided by bonds between
The DNA molecule is made up of two linked strands which together form a helix. The links between the strands are provided by bonds between pairs of chemicals called bases. There are four such bases, named cytosine, guanine, adenine and thymine, which we will refer to by their initial letters C, G, A and T, respectively. Each base bonds to a base in the opposite strand, where C and G bond with each other, and A and T also bond with each other. No other bonds between bases occur. See https://www.yourgenome.org/ facts/ what-is-dna for more detail, if interested.
- a.Given a single strand of DNA represented by a string of characters, each corresponding to one of the four bases, e.g.
- 'CGGTACAATCGATTTAGAG',
- write an initial insight for how you would calculate the percentage of each of the bases (C, G, A and T) in a DNA strand. For example, there's 3/19 ≈ 15.7% of cytosine in the strand above. These percentages can be of interest to biologists investigating the DNA of organisms.
- (4 marks)
- b.Write a Python program to implement your insight from part (a), by completing the function percentBases in file TMA03_Q1.py. You can assume that only characters corresponding to the standard DNA bases (C, G, A and T) will occur in the input strings. Round the results to two decimal places, following the example in the code file.
- (4 marks)
- c.What is the complexity of your function percentBases from part (b), in terms of T(n) and Big-O notation, where n is the number of bases in the strand? Take the assignment statement as the basic unit of computation. Ignore the rounding operations for the purposes of this analysis.
- (4 marks)
The Genetic Code
The pattern of bases in DNA molecules forms a code (the Genetic Code) for making the proteins that are the basis of life on Earth. A simplified version of how this works is that groups of three bases (known as a codon) in the genes that form part of the DNA sequence each specify the production of one amino acid, and all the amino acids specified by a gene link together to form a protein molecule. There are 20 amino acids that normally occur in living organisms, and so the Genetic Code is said to have redundancy - there are many more possible codon patterns than there are amino acids, so typically several different codons will correspond to the same amino acid. Amino acids have names like Leucine, Tyrosine, etc., and we will use the standard abbreviations for these, such as 'Leu', 'Tyr', etc.
The start of a gene in a DNA strand is indicated by a special start codon, 'ATG', and the end of the gene is marked by a subsequent stop codon for which there are several possible patterns but our examples will only use 'TAG' for stop codons. Start codons generate an amino acid for the protein as well as marking the start of the gene. Stop codons do not add an amino acid to the protein and act only as a marker.
Actual human genes have typically 27 000 bases in each strand, and in some cases many more, but we will use examples with far fewer bases to demonstrate the principles. A DNA strand will typically contain many genes. A strand may have codons before the first start codon, in between its genes, and after the last stop codon, which do not give rise to any amino acids - these are called non-coding sequences.
- d.Given a single strand of DNA, represented by a string of characters, plus the locations of a start codon and the first subsequent stop codon, write an initial insight to translate the DNA codons of that gene into the sequence of amino acids in the corresponding protein. For example, for the DNA strand 'GGGATGCTTTAG', with a start codon beginning at location 3 and a stop codon beginning at location 9, the algorithm would produce a sequence with the two amino acids corresponding to codons 'ATG' and 'CTT'. Assume there is a table where the algorithm can look up the amino acid for a given codon. Remember that stop codons are just a marker and do not generate an amino acid.
- (4 marks)
- e.Complete the Python function translateGene in file TMA03_Q1.py to implement your insight from part (d). We have provided you with an easy way to use the Genetic Code in the form of a Python function called aminoAcid, which gives the abbreviated name of the amino acid corresponding to any given codon.
- To help in testing this function, we have provided you with a complete function called findCodon which can search a DNA strand for the location of a codon. See how it's used in the provided test file.
- (3 marks)
- f.Using your function from part (e), complete the function translateStrand so that it can process a DNA strand containing any number of genes (marked by start and stop codons) to produce the sequence of amino acids corresponding to each gene.
- In implementing this function, you may also wish to make use of the function findCodon mentioned in part (e).
Step by Step Solution
3.52 Rating (159 Votes )
There are 3 Steps involved in it
Step: 1
a To calculate the percentage of each base C G A and T in a DNA strand you can follow these steps 1 Initialize counters for each base to 0 2 Iterate through the DNA strand character by character 3 For ...See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started