Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Write a function that finds and returns the most common sub-sequence in a larger sequence of DNA. DNA is composed of a string of 'a',
Write a function that finds and returns the most common sub-sequence in a larger sequence of DNA. DNA is composed of a string of 'a', 'g, 'c, and 't's, e.g atcaatgatcaacgtaagcttctaagcatgatcaaggtgctcacacagtttatccacaac ctgagtggatgacatcaagataggtcgttgtatctccttcctctcgtactctcatgacca cggaaagatgatcaagagaggatgatttcttggccatatcgcaatgaatacttgtgactt gtgcttccaattgacatcttcagcgccatattgcgctggccaaggtgacggagcgggatt acgaaagcatgatcatggctgttgttctgtttatcttgttttgactgagacttgttagga tagacggtttttcatcactgactagccaaagccttactctgcctgacatcgaccgtaaat tgataatgaatttacatgcttccgcgacgatttacctcttgatcatcgatccgattgaag atcttcaattgttaattctcttgcctcgactcatagccatgatgagctcttgatcatgtt tccttaaccctctattttttacggaagaatgatcaagctgctgctcttgatcatcgtttc This is a tool used, for example, when analyzing DNA for possible replication site origins. Your function, mostCommonSubstring(dna, mink, maxk), takes a string as an argument (the DNA sequence) and also takes the shortest, mink, and longest, maxk, acceptable result length. Your task then is to look at all substrings of length mink, mink+1,.., maxk-1, maxk and return the one that occurs with most frequency throughout the entire sequence. If there is a tie, it returns the longer string. If the tie is between substrings of the same length, the choice is arbitrary, and you can return any of the tied equal-length substrings. For example, mostCommonSubstringlgactctcagc, 2, 6) returns 'ctc' since it occurs twice and is longer than 'ct' and 'tc' which each also occur twice, and all other substrings of length 2, 3, 4,5 or 6 only occur once. Note that the occurrences can overlap. (Practice on this example before you try the whole huge file.) We'll implement this by writing another function, mostCommonK(dna, k), which looks for just the most common substring of length k, and returns it and it's frequency. So then mostCommonSubstring just calls mostCommonK repeatedly with each of mink, mink+1, etc. as arguments. The algorithm within mostCommonK is to take the first k letters of dna and see
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started