Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

[PYTHON] [PYTHON PLEASE] The genetic code of all living organisms is represented by a long sequence of simple molecules called nucleotides, or bases, which make

image text in transcribed
[PYTHON]
[PYTHON PLEASE]
The genetic code of all living organisms is represented by a long sequence of simple molecules called nucleotides, or bases, which make up human DNA. There are four nucleotides: A, C, G, and T. The genetic code of a human a string of 3.2 billion made of the letters A, C, G, and T. In this problem we search for a substring of length k that occurs most frequently in the human genome. DNA is composed of a string of 'a','g':', and 't's, e.g.. atcaatgatcaacgtaagcttctaagcatg atcaaggtgctcacacagtttatccacaac ctgagtagatgacatcaagataggtcgttg tatctccttcctctcytactctcatgacca cggagagatgatcaagagaggatgatttct tggccatatcgcaatgaatacttgtgactt gtgcttccaattgacatcttcagcgccata ttgcgctggccaaggtgacggagcoggatt acgaaagcatgatcatgactgttattctat ttatcttgttttgactgagacttgttagga tagacggtttttcatcactgactagccaaa gccttactctgcctgacatcgaccgtaaat tgataotgaatttocatgcttccgcgacga tttacctcttgatcatcgatccgattgoog atcttcaattgttaattctcttgcctcgac tcatagccatgatgdgctcttgat catgtt tcettaaccctctottttttacggaagaat gatcaagctgctgctcttgatcategtttc 1- Write a function that takes a long string of letters and creates a list of all possible k letter sequences. For example, the string 'gcacttgcatgcac' has the following 3 letter sequences: Igca,cac act, ctt, ttg.tgc.gca, catats, tgc.gca,cac) The function should find which of the above sequences occur most often. In this example, 'gca' appears three times and other sequences occur once. The function returns the highest occurring substring and its count. 2.Write a function that calls the above function. We will search for a sub-sequence of k letters that occurs most frequently, kis a variable between min length and max length. For example, if min length=4 and max_length 8. the program first finds which 4-letter sequence occurs most and with what frequency. It does the same thing for 5.6.7. and 8-letter sequences. Then it will print for example 'cttt:52. It means that the highest frequency of occurrence occurred in 5-letter sequences and the highest frequencies in other cases were lower than 52

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Relational Database Design A Practical Approach

Authors: Marilyn Campbell

1st Edition

1587193175, 978-1587193170

More Books

Students also viewed these Databases questions

Question

LO1 Identify why performance management is necessary.

Answered: 1 week ago