Answered step by step
Verified Expert Solution
Question
1 Approved Answer
[PYTHON] [PYTHON PLEASE] The genetic code of all living organisms is represented by a long sequence of simple molecules called nucleotides, or bases, which make
[PYTHON]
[PYTHON PLEASE]
The genetic code of all living organisms is represented by a long sequence of simple molecules called nucleotides, or bases, which make up human DNA. There are four nucleotides: A, C, G, and T. The genetic code of a human a string of 3.2 billion made of the letters A, C, G, and T. In this problem we search for a substring of length k that occurs most frequently in the human genome. DNA is composed of a string of 'a','g':', and 't's, e.g.. atcaatgatcaacgtaagcttctaagcatg atcaaggtgctcacacagtttatccacaac ctgagtagatgacatcaagataggtcgttg tatctccttcctctcytactctcatgacca cggagagatgatcaagagaggatgatttct tggccatatcgcaatgaatacttgtgactt gtgcttccaattgacatcttcagcgccata ttgcgctggccaaggtgacggagcoggatt acgaaagcatgatcatgactgttattctat ttatcttgttttgactgagacttgttagga tagacggtttttcatcactgactagccaaa gccttactctgcctgacatcgaccgtaaat tgataotgaatttocatgcttccgcgacga tttacctcttgatcatcgatccgattgoog atcttcaattgttaattctcttgcctcgac tcatagccatgatgdgctcttgat catgtt tcettaaccctctottttttacggaagaat gatcaagctgctgctcttgatcategtttc 1- Write a function that takes a long string of letters and creates a list of all possible k letter sequences. For example, the string 'gcacttgcatgcac' has the following 3 letter sequences: Igca,cac act, ctt, ttg.tgc.gca, catats, tgc.gca,cac) The function should find which of the above sequences occur most often. In this example, 'gca' appears three times and other sequences occur once. The function returns the highest occurring substring and its count. 2.Write a function that calls the above function. We will search for a sub-sequence of k letters that occurs most frequently, kis a variable between min length and max length. For example, if min length=4 and max_length 8. the program first finds which 4-letter sequence occurs most and with what frequency. It does the same thing for 5.6.7. and 8-letter sequences. Then it will print for example 'cttt:52. It means that the highest frequency of occurrence occurred in 5-letter sequences and the highest frequencies in other cases were lower than 52 Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started