Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The DNA sequence of a human genome can be encoded as a string of A's, T's, G's, and C's three billion characters long. The letters

The DNA sequence of a human genome can be encoded as a string of A's, T's, G's, and C's three billion characters long. The letters represent the four DNA base nucleotides used to encode the genomes of all life on Earth. (A = Adenine, T = Thymine, G = Guanine, C = Cytosine).

i. Assuming an optimal fixed-length encoding for the nucleotide bases (A, T, G, C), how many megabytes are required to store 1 human genome? (1 byte = 8 bits, 1 Megabyte = 106 bytes).

ii. In the human genome, the base nucleotides are not equally probable. The proportion of G's and C's is only about 40% while A's and T's constitute about 60% of the genome. (The proportion of A's and T's are equal, as are the proportion of G's and C's.) According to Shannon information theory, what is the entropy of human DNA (i.e., the average information content per nucleotide)?

iii. You have discovered that the entropy is less than what is required of a fixed-length encoding (1.97 < 2). Perhaps we can do better with a variable length encoding! Define a valid variable-length encoding that allocates fewer bits to the more frequent nucleotides. Draw the corresponding Huffman Tree. Is this encoding actually more efficient for representing an entire human genome? Explain your answer by determining the expected number of bits per nucleotide resulting from your candidate encoding.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction To Probability Models

Authors: Sheldon M Ross

12th Edition

0128143460, 9780128143469

More Books

Students also viewed these Mathematics questions