Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Assignment 3 - Summer 2018 CS 4329 Introduction to bioinformatics. (100 points) Due Date: 11:59 PM of 06/27/2018 All code must be written in C++

Assignment 3 - Summer 2018

CS 4329 Introduction to bioinformatics. (100 points)

Due Date: 11:59 PM of 06/27/2018

All code must be written in C++ (C++ was also used in CS1, CS2, and CS3). You should submit

The source codes.

A document showing the output.

Put all the individual files in one single folder and compress the folder. Upload the compressed folder. Files should include your first and last names.

1. In this question, you will investigate the nucleotides at the splicing sites (intersection of the exon and intron) within protein coding genes in human genome. You are given a fasta file called gene_fasta_chr12.fa which contain the sequences of randomly selected 2,412 protein coding genes from chromosome 12 in human. The sequence includes both the exon and intron portions of the gene. The nucleotides in exons are uppercased and the ones in the intron are lower case. Implement programs to compute the following [100 points]

Average number of exons in a gene

Average number of introns in a gene

Length of the longest and shortest intron

Length of the longest and shortest exon

Look at the positions immediately after each exon (donor site or the first two bases of each intron) in all the genes and count the frequency of all possible 2-mers at those locations. (GT is expected to have the highest frequency).

Look at the positions immediately before internal exons (splice acceptor sites or the last two bases of each intron) in all the genes and count the frequency of all possible 2-mers at those locations. (AG is expected to have the highest frequency).

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Probabilistic Databases

Authors: Dan Suciu, Dan Olteanu, Christopher Re, Christoph Koch

1st Edition

3031007514, 978-3031007514

More Books

Students also viewed these Databases questions

Question

What is a Contract Data Requirements List (CDRL)?

Answered: 1 week ago