Question

1 Approved Answer

Posted on Sep 27, 2024

in python please ! a string consisting purely of nucleotides by first removing the in and then all the newline characters. You'll need to do

in python please !

a string consisting purely of nucleotides by first removing the in and then all the newline characters. You'll need to do the following to find all the occurrences of the substring: header information the first line . Assign the substring " ATG') an Ivalue of scodon .Initialize a start location start as -1. . Use a for-loop with the following in the header: range (gstring.count (scodon) ) . Use find () or index) with two arguments, the second argument being start incre- mented by 1. Assign the value returned to start. Print each start location. To check your results, use a slice of gstring to verify that the nucleotides you've found are actually ATG. You only need to do this a few times. The listing below shows some of the correct indices. 86 107 137 258 294 365 455 476 483 533 560 567 594 636 644 728 791 830 ..[rows of output deleted] XS7795 8079 8112 8309 8434 8545 8638 8648 8685 8738 8759 8766 8831 8872 8895 8906 8919 9065 9092 9451 9469 9478 9529 9550 9563 9583 9591 Two points should be made. First, not all of these indices represent actual start codons because the three nucleotides may not belong to a codon. For example, suppose we have twtcodons ACA and TGT adjacent to each other: ACATGT. find ) will give the index for the third nucleotide of the first codon because of the first and second nucleotides of the second codon. Second, the indices given are not the actual positions of the nucleotides within the genome; they are off by 1 because string indexing begins at 0, whereas genome positions start at i. Thus, the positions would be 87, 108, 138, and so on Task 3: A Bit of Bioinformatics, Part II Three adjacent nucleotides in a gene are known as a codon, and each codon in a gene codes for a single amino acid (i.e., a codon is translated into a single amino acid via a cell's machinery). The mapping of codons to amino acids is known as the genetic code. A protein is composed of a chain of amino acids, so the series of codons that make up a gene determine the amino acids that make up a single protein. Surprisingly, genes account for only a small fraction of the DNA in a genome, perhaps 10%, thereby making it necessary to actually find the genes in a genome. In bacteria, genes can be identified by their start and stop codons, and in most bacteria the start codon codes for the amino acid methionine with a codon of ATG For this task you're going to write code to find all possible start codons in the genome you used in the previous task. You can use either the find) or index () method to find the location of a substring within a string. Both these methods return the index of the location of the first character in the string. To find the next occurrence of the substring you must add 1 to the index of the previous occurrence. Consider the following example: SMassachusetts s.find ('s') ? The default start location is o. 4 >s.find('s') # If we don't change the start location # the first occurrence is reported >>>s.find('s',3) We add 1 to the previous index returned >s.find('s',4) We do this again and again s a.tindl' s >s.find's', 13) . ...until there are no occurrences lert # to find. find ( ) returns-1. The substring you want to find contains the three nucleotides for the start codon, and for the genome of interest here, the start codon codes for methionine and is ATG. The string gatring a string consisting purely of nucleotides by first removing the in and then all the newline characters. You'll need to do the following to find all the occurrences of the substring: header information the first line . Assign the substring " ATG') an Ivalue of scodon .Initialize a start location start as -1. . Use a for-loop with the following in the header: range (gstring.count (scodon) ) . Use find () or index) with two arguments, the second argument being start incre- mented by 1. Assign the value returned to start. Print each start location. To check your results, use a slice of gstring to verify that the nucleotides you've found are actually ATG. You only need to do this a few times. The listing below shows some of the correct indices. 86 107 137 258 294 365 455 476 483 533 560 567 594 636 644 728 791 830 ..[rows of output deleted] XS7795 8079 8112 8309 8434 8545 8638 8648 8685 8738 8759 8766 8831 8872 8895 8906 8919 9065 9092 9451 9469 9478 9529 9550 9563 9583 9591 Two points should be made. First, not all of these indices represent actual start codons because the three nucleotides may not belong to a codon. For example, suppose we have twtcodons ACA and TGT adjacent to each other: ACATGT. find ) will give the index for the third nucleotide of the first codon because of the first and second nucleotides of the second codon. Second, the indices given are not the actual positions of the nucleotides within the genome; they are off by 1 because string indexing begins at 0, whereas genome positions start at i. Thus, the positions would be 87, 108, 138, and so on Task 3: A Bit of Bioinformatics, Part II Three adjacent nucleotides in a gene are known as a codon, and each codon in a gene codes for a single amino acid (i.e., a codon is translated into a single amino acid via a cell's machinery). The mapping of codons to amino acids is known as the genetic code. A protein is composed of a chain of amino acids, so the series of codons that make up a gene determine the amino acids that make up a single protein. Surprisingly, genes account for only a small fraction of the DNA in a genome, perhaps 10%, thereby making it necessary to actually find the genes in a genome. In bacteria, genes can be identified by their start and stop codons, and in most bacteria the start codon codes for the amino acid methionine with a codon of ATG For this task you're going to write code to find all possible start codons in the genome you used in the previous task. You can use either the find) or index () method to find the location of a substring within a string. Both these methods return the index of the location of the first character in the string. To find the next occurrence of the substring you must add 1 to the index of the previous occurrence. Consider the following example: SMassachusetts s.find ('s') ? The default start location is o. 4 >s.find('s') # If we don't change the start location # the first occurrence is reported >>>s.find('s',3) We add 1 to the previous index returned >s.find('s',4) We do this again and again s a.tindl' s >s.find's', 13) . ...until there are no occurrences lert # to find. find ( ) returns-1. The substring you want to find contains the three nucleotides for the start codon, and for the genome of interest here, the start codon codes for methionine and is ATG. The string gatring