Answered step by step
Verified Expert Solution
Link Copied!

Question

00
1 Approved Answer

Write a function that finds and returns the most common sub-sequence in a larger sequence of DNA. DNA is composed of a string of 'a',

image text in transcribedimage text in transcribed

Write a function that finds and returns the most common sub-sequence in a larger sequence of DNA. DNA is composed of a string of 'a', 'g, 'c, and 't's, e.g atcaatgatcaacgtaagcttctaagcatgatcaaggtgctcacacagtttatccacaac ctgagtggatgacatcaagataggtcgttgtatctccttcctctcgtactctcatgacca cggaaagatgatcaagagaggatgatttcttggccatatcgcaatgaatacttgtgactt gtgcttccaattgacatcttcagcgccatattgcgctggccaaggtgacggagcgggatt acgaaagcatgatcatggctgttgttctgtttatcttgttttgactgagacttgttagga tagacggtttttcatcactgactagccaaagccttactctgcctgacatcgaccgtaaat tgataatgaatttacatgcttccgcgacgatttacctcttgatcatcgatccgattgaag atcttcaattgttaattctcttgcctcgactcatagccatgatgagctcttgatcatgtt tccttaaccctctattttttacggaagaatgatcaagctgctgctcttgatcatcgtttc This is a tool used, for example, when analyzing DNA for possible replication site origins. Your function, mostCommonSubstring(dna, mink, maxk), takes a string as an argument (the DNA sequence) and also takes the shortest, mink, and longest, maxk, acceptable result length. Your task then is to look at all substrings of length mink, mink+1,.., maxk-1, maxk and return the one that occurs with most frequency throughout the entire sequence. If there is a tie, it returns the longer string. If the tie is between substrings of the same length, the choice is arbitrary, and you can return any of the tied equal-length substrings. For example, mostCommonSubstringlgactctcagc, 2, 6) returns 'ctc' since it occurs twice and is longer than 'ct' and 'tc' which each also occur twice, and all other substrings of length 2, 3, 4,5 or 6 only occur once. Note that the occurrences can overlap. (Practice on this example before you try the whole huge file.) We'll implement this by writing another function, mostCommonK(dna, k), which looks for just the most common substring of length k, and returns it and it's frequency. So then mostCommonSubstring just calls mostCommonK repeatedly with each of mink, mink+1, etc. as arguments. The algorithm within mostCommonK is to take the first k letters of dna and see

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions