Question
Assignment 3 - Summer 2018 CS 4329 Introduction to bioinformatics. (100 points) Due Date: 11:59 PM of 06/27/2018 All code must be written in C++
Assignment 3 - Summer 2018
CS 4329 Introduction to bioinformatics. (100 points)
Due Date: 11:59 PM of 06/27/2018
All code must be written in C++ (C++ was also used in CS1, CS2, and CS3). You should submit
The source codes.
A document showing the output.
Put all the individual files in one single folder and compress the folder. Upload the compressed folder. Files should include your first and last names.
1. In this question, you will investigate the nucleotides at the splicing sites (intersection of the exon and intron) within protein coding genes in human genome. You are given a fasta file called gene_fasta_chr12.fa which contain the sequences of randomly selected 2,412 protein coding genes from chromosome 12 in human. The sequence includes both the exon and intron portions of the gene. The nucleotides in exons are uppercased and the ones in the intron are lower case. Implement programs to compute the following [100 points]
Average number of exons in a gene
Average number of introns in a gene
Length of the longest and shortest intron
Length of the longest and shortest exon
Look at the positions immediately after each exon (donor site or the first two bases of each intron) in all the genes and count the frequency of all possible 2-mers at those locations. (GT is expected to have the highest frequency).
Look at the positions immediately before internal exons (splice acceptor sites or the last two bases of each intron) in all the genes and count the frequency of all possible 2-mers at those locations. (AG is expected to have the highest frequency).
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started