Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please use Matlab or Python! In this question you will use nucleotide sequences ob- tained from individual insects to determine nucleotide frequency for thirteen different

Please use Matlab or Python!

In this question you will use nucleotide sequences ob- tained from individual insects to determine nucleotide frequency for thirteen different insect species. In each nucleotide sequence there are four different types of DNA nucleotides: adenine (A), thymine (T), guanine (G), and cy- tosine (C) (- indicates missing values). Consider these as the characters in a nucleotide string. Extend this character set with doublet (42) and triplet (43) combinations of these four nucleotides to come up with a token list. The size of this token set is supposed to be 84 (16+64+4=84).

  1. Write a script that will search for each of the 84 tokens in a given nucleotide series. When searching for doublets and triplets the search should use a stride of 1. Example: If the nucleotide series is GGCAC then the search should find GG, GC, CA, AC doublet tokens and GGC, GCA, and CAC triplet tokens.

  2. Write a script that will estimate the multinomial probability vector for each of the 13 species.

  3. Plot these 13 probability vectors and see if you can extract a subset of tokens that can be potentially useful for identifying species.

Given nucleotide sequence:{'TGATCTGGAATAGTCGGAACTTCTCTAAGAATTTTAATTCGTGCTGAACTTAGCCACCCTGGTATATTTATTGGGAATGACCAAATTTATAATGTAATTGTAACAGCTCATGCATTTATTATAATTTTCTTTATAGTAATGCCAATTATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATATTAGGAGCCCCTGATATAGCTTTCCCTCGAATGAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACTCTTCTATTATCAAGCTCAATAGTAGAAAATGGGGCAGGAACTGGGTGAACAGTTTATCCTCCTCTCTCTTCAGGAACAGCTCATGCTGGAGCTTCTGTTGATTTAGCTATTTTTTCTCTTCATTTAGCTGGAATTTCCTCAATTTTAGGGGCAGTAAATTTTATTACAACTGTGATTAATATGCGATCGTCAGGGATTACTTTAGATCGACTACCCTTATTTGTTTGATCTGTAGTTATTACAGCTATCTTATTACTTCTTTCTCTTCCTGTTTTAGCTGGAGCTATTACTATATTATTAACAGACCGAAACTTAAATACATCTTTCTTT'}

Given 13 species: {'Aedes aegypti'} {'Aedes albopictus'} {'Culex quinquefasciatus'} {'Drosophila affinis'} {'Euplectrus geometricida'} {'Hylemyza partita'} {'Megachile sp'} {'Ophion cf. luteus_cluster1'} {'Simulium metallicum s.l'} {'Simulium nemorale'} {'Tetrastichus atratulus'} {'Tetrastichus halidayi'} {'Tribolium castaneum'}

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Marketing The Ultimate Marketing Tool

Authors: Edward L. Nash

1st Edition

0070460639, 978-0070460638

More Books

Students also viewed these Databases questions

Question

8. Use headings and subheadings for longer procedures.

Answered: 1 week ago

Question

Prepare for a successful job interview.

Answered: 1 week ago